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Abstract 
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^| This article is a pedagogical review of Monte Carlo methods for the self- 

avoiding walk, with emphasis on the extraordinarily efficient algorithms devel- 
oped over the past decade. 
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1 Introduction 



1.1 Why is the SAW a Sensible Model? 

The self-avoiding walk (SAW) was first proposed nearly half a century ago as a 
model of a linear polymer molecule in a good solvent [f, 2]. At first glance it seems 
to be a ridiculously crude model, almost a caricature: Real polymer molecules 1 live 
in continuous space and have tetrahedral (109.47°) bond angles, a non-trivial energy 
surface for the bond rotation angles, and a rather complicated monomer-monomer 
interaction potential. By contrast, the self- avoiding walk lives on a discrete lattice 
and has non-tetrahedral bond angles (e.g. 90° and 180° on the simple cubic lattice), an 
energy independent of the bond rotation angles, and a repulsive hard-core monomer- 
monomer potential. 

In spite of these rather extreme simplifications, there is now little doubt that the 
self-avoiding walk is not merely an excellent but in fact a perfect model for some (but 
not all!) aspects of the behavior of linear polymers in a good solvent. 2 This apparent 
miracle arises from universality, which plays a central role in the modern theory of 
critical phenomena [3, 4]. In brief, critical statistical-mechanical systems are divided 
into a small number of universality classes, which are typically characterized by 
spatial dimensionality, symmetries and other rather general properties. In the vicin- 
ity of a critical point (and only there), the leading asymptotic behavior is exactly 
the same (modulo some system-dependent scale factors) for all systems of a given 
universality class; the details of chemical structure, interaction energies and so forth 
are completely irrelevant (except for setting the nonuniversal scale factors). More- 
over, this universal behavior is given by simple scaling laws, in which the dependent 
variables are generalized homogeneous functions of the parameters which measure the 
deviation from criticality. 

The key question, therefore, is to determine for each physical problem which 
quantities are universal and which are nonuniversal. 

To compute the nonuniversal quantities, one employs the traditional methods of 
theoretical physics and chemistry: semi-realistic models followed by a process of suc- 
cessive refinement. All predictions from such models must be expected to be only 
approximate, even if the mathematical model is solved exactly, because the mathe- 
matical model is itself only a crude approximation to reality. 

To compute the universal quantities, by contrast, a very different approach is 
available: one may choose any mathematical model (the simpler the better) belonging 
to the same universality class as the system under study, and by solving it determine 
the exact values of universal quantities. Of course, it may not be feasible to solve this 
mathematical model exactly, so further approximations (or numerical simulations) 
may be required in practice; but these latter approximations are the only sources of 
error in the computation of universal quantities. At a subsequent stage it is prudent 

1 More precisely, linear polymers whose backbones consist solely of carbon-carbon single bonds. 

2 Here "good solvent" means "at any temperature strictly above the theta temperature for the 
given polymer-solvent pair" . 
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to test variants and refinements of the original model, but solely for the purpose of 
determining the boundaries of the universality class: if the refined model belongs to 
the same universality class as the original model, then the refinement has zero effect 
on the universal quantities. 

The behavior of polymer molecules as the chain length tends to infinity is, it turns 
out, a critical phenomenon in the above sense [5]. Thus, it is found empirically — 
although the existing experimental evidence is admittedly far from perfect [6, 7, 8, 
9, fO] — that the mean-square radius of gyration (R 2 g ) of a linear polymer molecule 
consisting of N monomer units has the leading asymptotic behavior 



as N — > oo, where the critical exponent v ~ 0.588 is universal, i.e. exactly the same 
for all polymers, solvents and temperatures (provided only that the temperature is 
above the theta temperature for the given polymer-solvent pair). The critical am- 
plitude A is nonuniversal, i.e. it depends on the polymer, solvent and temperature, 
and this dependence is not expected to be simple. 

There is therefore good reason to believe that any (real or mathematical) linear 
polymer chain which exhibits some flexibility and has short-range 3 , predominantly 
repulsive 4 monomer-monomer interactions lies in the same universality class as the 
self-avoiding walk. This belief should, of course, be checked carefully by both numeri- 
cal simulations and laboratory experiments; but at present there is, to my knowledge, 
no credible numerical or experimental evidence that would call it into question. 

1.2 Numerical Methods for the Self- Avoiding Walk 

Over the decades, the SAW has been studied extensively by a variety of methods. 
Rigorous methods have thus far yielded only fairly weak results [11]; the SAW is, to 
put it mildly, an extremely difficult mathematical problem. Non-rigorous analytical 
methods , such as perturbation theory and self-consistent-field theory, typically break 
down in precisely the region of interest, namely long chains [12]. The exceptions are 
methods based on the renormalization group (RG) [13, 14, 15], which have yielded 
reasonably accurate estimates for critical exponents and for some universal amplitude 
ratios [16, 17, 18, 19, 20, 21, 22, 23, 24]. However, the conceptual foundations of the 
renormalization-group methods have not yet been completely elucidated [25, 26]; and 
high-precision RG calculations are not always feasible. Thus, considerable work has 
been devoted to developing numerical methods for the study of long SAWs. These 
methods fall essentially into two categories: exact enumeration and Monte Carlo. 

In an exact-enumeration study, one first generates a complete list of all SAWs 
up to a certain length (usually N 15-35 steps), keeping track of the properties of 

3 Here I mean that the potential is short-range in physical space. It is of course — and this is a 
crucial point — long-range along the polymer chain, in the sense that the interaction between two 
monomers depends only on their positions in physical space and is essentially independent of the 
locations of those monomers along the chain. 

4 Here "predominantly repulsive" means "repulsive enough so that the temperature is strictly 
above the theta temperature for the given polymer-solvent pair" . 
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interest such as the number of such walks or their squared end-to-end distances [27]. 
One then performs an extrapolation to the limit N — > oo, using techniques such as the 
ratio method, Pade approximants or differential approximants [28, 29, 30]. Inherent 
in any such extrapolation is an assumption about the behavior of the coefficients 
beyond those actually computed. Sometimes this assumption is fairly explicit; other 
times it is hidden in the details of the extrapolation method. In either case, the 
assumptions made have a profound effect on the numerical results obtained [25]. For 
this reason, the claimed error bars in exact-enumeration/extrapolation studies should 
be viewed with a healthy skepticism. 

In a Monte Carlo study, by contrast, one aims to probe directly the regime of 
fairly long SAWs (usually N 10 2 — 10 s steps). Complete enumeration is unfeasible, 
so one generates instead a random sample. The raw data then contain statistical 
errors, just as in a laboratory experiment. These errors can, however be estimated — 
sometimes even a priori (see Section 7.3) — and they can in principle be reduced to 
an arbitrarily low level by the use of sufficient computer time. An extrapolation to the 
regime of extremely long SAWs is still required, but this extrapolation is much less 
severe than in the case of exact-enumeration studies, because the point of departure 
is already much closer to the asymptotic regime. 

Monte Carlo studies of the self-avoiding walk go back to the early I950's [31, 32], 
and indeed these simulations were among the first applications of a new invention 
— the "high-speed electronic digital computer" — to pure science. 5 These studies 
continued throughout the I960's and I970's, and benefited from the increasingly 
powerful computers that became available. However, progress was slowed by the high 
computational complexity of the algorithms then being employed, which typically 
required a CPU time of order at least N 2+2v = jV~ 3 - 2 to produce one "effectively 
independent" A^-step SAW. This rapid growth with A" of the autocorrelation time — 
called critical slowing-down 6 — made it difficult in practice to do high-precision 
simulations with A" greater than about 30-100. 

In the past decade — since 1981 or so — vast progress has been made in the 
development of new and more efficient algorithms for simulating the self-avoiding 
walk. These new algorithms reduce the CPU time for generating an "effectively 
independent" A^-step SAW from ~A^~ 3 ' 2 to ~A^~ 2 or even ~A^. The latter is quite 
impressive, and indeed is the best possible order of magnitude, since it takes a time 
of order A" merely to write down an A^-step walk! As a practical matter, the new 
algorithms have made possible high-precision simulations at chain lengths A" up to 
nearly I0 5 [39]. 

The purpose of this article is thus to give a comprehensive overview of Monte Carlo 
methods for the self-avoiding walk, with emphasis on the extraordinarily efficient 

5 Here "pure" means "not useful in the sense of Hardy": "a science is said to be useful if its 
development tends to accentuate the existing inequalities in the distribution of wealth, or more 
directly promotes the destruction of human life" [33, p. 120m]. 

6 For a general introduction to critical slowing-down in Monte Carlo simulations, see [34, 35, 36, 
37]. See also [38] for a pioneering treatment of critical slowing-down in the context of dynamic 
critical phenomena. 
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algorithms developed since 1981. I shall also discuss briefly some of the physical 
results which have been obtained from this work. 

The plan of this article is as follows: I begin by presenting background material on 
the self-avoiding walk (Section 2) and on Monte Carlo methods (Section 3). In Section 
4 I discuss static Monte Carlo methods for the generation of SAWs: simple sampling 
and its variants, inversely restricted sampling (Rosenbluth-Rosenbluth algorithm) and 
its variants, and dimerization. In Section 5 I discuss quasi-static Monte Carlo meth- 
ods: enrichment and incomplete enumeration (Redner- Reynolds algorithm). In Sec- 
tion 6 I discuss dynamic Monte Carlo methods: the methods are classified according 
to whether they are local or non-local, whether they are iV-conserving or iV-changing, 
and whether they are endpoint-conserving or endpoint-changing. In Section 7 I dis- 
cuss some miscellaneous algorithmic and statistical issues. In Section 8 I review some 
preliminary physical results which have been obtained using these new algorithms. I 
conclude (Section 9) with a brief summary of practical recommendations and a listing 
of open problems. 

For previous reviews of Monte Carlo methods for the self-avoiding walk, see Kre- 
mer and Binder [40] and Madras and Slade [11, Chapter 9]. 

2 The Self-Avoiding Walk (SAW) 
2.1 Background and Notation 

In this section we review briefly the basic facts and conjectures about the SAW 
that will be used in the remainder of this article. A comprehensive survey of the 
SAW, with emphasis on rigorous mathematical results, can be found in the excellent 
new book by Madras and Slade [11]. 

Real polymers live in spatial dimension d = 3 (ordinary polymer solutions) or in 
some cases in d = 2 (polymer monolayers confined to an interface [41, 42]). Never- 
theless, it is of great conceptual value to define and study the mathematical models 
— in particular, the SAW — in a general dimension d. This permits us to distinguish 
clearly between the general features of polymer behavior (in any dimension) and the 
special features of polymers in dimension d = 3. 7 The use of arbitrary dimensionality 
also makes available to theorists some useful technical tools (e.g. dimensional regu- 
larization) and some valuable approximation schemes (e.g. expansion in d = 4 — e) 
[15]- 

So let C be some regular o?- dimensional lattice. Then an A^-step self-avoiding 

walk 8 (SAW) u on C is a sequence of distinct points cu ,cui, ...,cujv in C such that each 



It turns out that d = 3 is very special, because it is the upper critical dimension for tricritical 
behavior. This is the deep reason underlying the fact that polymers at the theta point in d = 3 are 
"quasi-ideal" (i.e. have size exponent v = \ an d have all dimensionless virial coefficients vanishing 
in the limit of infinite chain length). In dimension d < 3, polymers at the theta point are not 
quasi-ideal [43, 44, 45, 46, 47]. 

8 The term "walk" is a misnomer. The SAW should not be thought of as the path of a particle 
which "walks" (over time). Rather, it should be thought of as the configuration of a polymer chain 
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point is a nearest neighbor of its predecessor. 9 We denote by \u\ = N the number 
of steps in u. For simplicity we shall restrict attention to the simple (hyper-)cubic 
lattice Z ; similar ideas apply with minor alterations to other regular lattices. We 
assume all walks to begin at the origin (lo = 0) unless stated otherwise, and we let 
Sn [resp. Sn(x)] be the set of all iV-step SAWs starting at the origin and ending 
anywhere [resp. ending at x\. 

First we define the quantities relating to the number (or "entropy") of SAWs: 
Let cjv [resp. cn(x)] be the number of A^-step SAWs on Z d starting at the origin and 
ending anywhere [resp. ending at x\. Then cjv and cn(x) are believed to have the 
asymptotic behavior 

c N ~ //"AH- 1 (2.1) 
c N (x) ~ ^N 01 ""*- 2 (x fixed ^ 0) (2.2) 

as A" — > oo; here // is called the connective constant of the lattice, and 7 and a s i ng 
are critical exponents. The connective constant is definitely lattice-dependent, 
while the critical exponents are believed to be universal among lattices of a given 
dimension d. For rigorous results concerning the asymptotic behavior of cjv and 
c N (x), see [11, 48, 49, 50, 51]. 

Next we define several measures of the size of an A^-step SAW: 

• The squared end-to-end distance 

R 2 e = {co N - to f . (2.3) 

• The squared radius of gyration 



a 



I N 1 N \ 

R'i = Y\ui — T ^ (2.4a) 

1 A o ( 1 N x2 



" UttSH (2 - 4b) 



A^ + 1 ^ 1 \N + . ■ n 
1 N 

= —, ^ y {ui - w,) 2 • (2.4c) 

• The mean-square distance of a monomer from the endpoints 



1 N 

Rl = 2(N + 1) S ^ " U ° )2 + {Ul " ""^ ' (2 ' 5) 



i=0 



at one instant of time. [In mathematical terms, the SAW is not a stochastic process (not even a 
non-Markovian one): the trouble is that the equal- weight distributions on TV-step and (N + l)-step 
SAWs are not consistent.] 

9 Note that a SAW is an oriented object, i.e. we distinguish between the starting point (wo) 
and the ending point (wjv). However, all probability distributions and all observables that we shall 
consider are invariant under reversal of orientation (w; = wjv-;). This is necessary if the SAW is 
to be a sensible model of a real homopolymer molecule, which is of course (neglecting end-group 
effects) unoriented. 
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We then consider the mean values (i?g)jv, (R 2 )n and (R 2 m )N in the probability dis- 
tribution which gives equal weight to each iV-step SAW. Very little has been proven 
rigorously about these mean values, but they are believed to have the asymptotic 
behavior 

(RDn, (R 2 3 )n, (R 2 Jn ~ (2.6) 

as N — > oo, where v is another (universal) critical exponent. Moreover, the amplitude 
ratios 

{R e )N 
{R e )N 

are expected to approach universal values in the limit N — > oo. 10,11 

Finally, let cjq lt jq 2 be the number of pairs (u/ 1 ), u/ 2 )) such that is an iVi-step 
SAW starting at the origin, is an A^-step SAW starting anywhere, and and 
cu( 2 ) have at least one point in common (i.e. ^ 0). Then it is believed that 

cn u n 2 ~ ^HN^f^'-^giNjN,) (2.9) 

as Ni 7 N 2 — > oo, where A 4 is yet another (universal) critical exponent and g is a 
(universal) scaling function. 

The quantity CN lt N 2 i s closely related to the second virial coefficient. To see this, 
consider a rather general theory in which "molecules" of various types interact. Let 
the molecules of type i have a set Si of "internal states", so that the complete state of 
such a molecule is given by a pair (x, s) where x £ T d is its position and s £ Si is its 
internal state. Let us assign Boltzmann weights (or "fugacities") Wi(s) [s £ Si] to the 
internal states, normalized so that J2 s es t Wi(s) = 1; and let us assign an interaction 
energy Vij ( ( ' £ Z , s £ Si, s 1 £ Sj] to a molecule of type i at (x, s) 

interacting with one of type j at (x' } s'). Then the second virial coefficient between a 
molecule of type i and one of type j is 

B { 2 tj) = - E E Wi^Wjis') [l - e -Vo-((o,»),(*V))] . (2.10) 

In the SAW case, the types are the different lengths N, the internal states are the 
conformations u £ Sn starting at the origin, the Boltzmann weights are Wn(lo) = 



10 For a general discussion of universal amplitude ratios in the theory of critical phenomena, see 
[52]. 

11 Very recently, Hara and Slade [48, 49] have proven that the SAW in dimension d > 5 converges 
weakly to Brownian motion when N — ► oo with lengths rescaled by CN 1 ! 2 for a suitable (nonuniver- 
sal) constant C. It follows from this that (2.6) holds with v = and also that (2.7)/(2.8) have the 
limiting values A x = ^, = Earlier, Slade [53, 54, 55] had proven these results for sufficiently 
high dimension d. See also [11]. 
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1/cjv for each u £ <Sjv, and the interaction energies are hard-core repulsions 



-,, ( ( \ i i i\\ / +00 if (to + x) fl (to 1 + x') ^ / 1 1 \ 

v 7 I U otherwise 

It follows immediately that 

B (N U N 2 ) = _CN l2 N^_ {2A2) 

2cjVj cjv 2 

The second virial coefficient B^ 1,N2 ^ is a measure of the "excluded volume" be- 
tween a pair of SAWs. It is useful to define a dimensionless quantity by normalizing 
by some measure of the "size" of these SAWs. Theorists prefer (R 2 e ) as the 
measure of size, while experimentalists prefer (R 2 ) since it can be measured by light 
scattering. We follow the experimentalists and define the interpenetration ratio 

B (N,N) 

m N = 2(dll2^fl 2 -^ m (2.13a) 

c N\ rL g/N 

(for simplicity we consider only Ni = N 2 = N). The numerical prefactor is a conven- 
tion that arose historically for reasons not worth explaining here. Crudely speaking, 
^ measures the degree of "hardness" of a SAW in its interactions with other SAWs. 12 
vFjv is expected to approach a universal value ^* in the limit N — > 00. A deep 
question is whether ^* is nonzero (this is called hyperscaling). It is now known 
that hyperscaling fails for SAWs in dimension d > 4 [11, 48, 49]. It is believed that 
hyperscaling holds for SAWs in dimension d < 4, but the theoretical justification 
of this fact is a key unsolved problem in the theory of critical phenomena (see e.g. 

[39]).; 3 

Higher virial coefficients can be defined analogously, but the details will not be 
needed here. 



12 A useful standard of comparison is the hard sphere of constant density: 

f « 1.12838 inrf=l 

* - 2 (i±l V _ J 4/3 in d = 2 

hard - sphere ~ dT(d/2){ 3 J ~ |«1.61859 mrf=3 

I 2 in d = 4 

13 A very beautiful heuristic argument concerning hyperscaling for SAWs was given by des 
Cloizeaux [56]. Note first from (2.13b) that 'f measures, roughly speaking, the probability of inter- 
section of two independent SAWs that start a distance of order (R 2 ,) 1 ! 2 ~ N" apart. Now, by (2.6), 
we can interpret a long SAW as an object with "fractal dimension" 1/v. Two independent such ob- 
jects will "generically" intersect if and only if the sum of their fractal dimensions is at least as large 
as the dimension of the ambient space. So we expect 'f* to be nonzero if and only ii 1/ v -\- 1/ v > d, 
i.e. dv < 2. This occurs for d < 4. (For d = 4 we believe that dv = "2 + logs", and thus expect a 
logarithmic violation of hyperscaling.) 
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Remark. The critical exponents defined here lor the SAW are precise analogues ol 
the critical exponents as conventionally defined for ferromagnetic spin systems [57, 58]. 
Indeed, the generating functions of the SAW are equal to the correlation functions 
of the n-vector spin model analytically continued to n = [59, 60, 61, 62, 11]. This 
"polymer-magnet correspondence" 14 is very useful in polymer theory; but we shall 
not need it in this article. 

2.2 The Ensembles 

Different aspects of the SAW can be probed in four different ensembles 15 : 

• Fixed-length, fixed-endpoint ensemble (fixed N, fixed x) 

• Fixed-length, free-endpoint ensemble (fixed N, variable x) 

• Variable-length, fixed-endpoint ensemble (variable N, fixed x) 

• Variable-length, free-endpoint ensemble (variable N, variable x) 

The fixed-length ensembles are best suited for studying the critical exponents v and 
2A 4 — 7, while the variable-length ensembles are best suited for studying the connec- 
tive constant // and the critical exponents a s i ng (fixed-endpoint) or 7 (free-endpoint). 
Physically, the free-endpoint ensembles correspond to linear polymers, while the fixed- 
endpoint ensembles with \x\ = 1 correspond to ring polymers. 

All these ensembles give equal weight to all walks of a given length; but the 
variable-length ensembles have considerable freedom in choosing the relative weights 
of different chain lengths N. The details follows: 

Fixed-A^, fixed-x ensemble. The state space is Sn(x) } and the probability 
distribution is tt(lo) = 1/cn(x) for each u £ Sn(x). 

Fixed-A^, variable-x ensemble. The state space is Sn, and the probability 
distribution is tt(lo) = 1/cjy for each u £ Sn- 

00 

Variable- N, fixed-x ensemble. The state space is S(x) = [j Sn{x), and the 

N=0 

probability distribution is generally taken to be 

TTf3, p (u) = f3 H \u\ p /Z(f3 } p;x) for each to £ S(x) (2.14) 

14 It is sometimes called the "polymer-magnet analogy", but this phrase is misleading: at least for 
SAWs (athermal linear polymers), the correspondence is an exact mathematical identity [11, Section 
2.3], not merely an "analogy". 

15 The proper terminology for these ensembles is unclear to me. The fixed-length and variable- 
length ensembles are sometimes called "canonical" and "grand canonical" , respectively (based on 
considering the monomers as particles). On the other hand, it might be better to call these ensembles 
"microcanonical" and "canonical", respectively (considering the polymers as particles and the chain 
length as an "energy") — reserving the term "grand canonical" for ensembles of many SAWs. My 
current preference is to avoid entirely these ambiguous terms, and simply say what one means: 
"fixed-length", "variable-length", etc. 
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where 

CO 

Z(P,p;x) = /3 N N p c N {x) . (2.15) 

N=0 

Here p > is a fixed number (usually or f), and f3 is a monomer fugacity that 
can be varied between and f3 c = f//i. By tuning f3 we can control the distribution 
of walk lengths N. Indeed, from (2.2) we have 

/ N \ ~ p + a ^ ~ 1 (2.16) 
1 — pn 

as f3 | /3 C , provided that p + a S m 3 > l. 16 Therefore, to generate a distribution of 
predominantly long (but not too long) walks, it suffices to choose f3 slightly less than 
(but not too close to) f3 c . 

oo 

Variable-A^, variable-x ensemble. The state space is S = [j Sn, and the 

N=0 

probability distribution is generally taken to be 

TTf3, p (u) = f3 H \u\ p /Z(f3 } p) foreachcuG^ (2.17) 

where 

oo 

Z(P,p) = ^2p N N p c N . (2.18) 

N=0 

p and P before, and from (2.1) we have 

(N) * ^ (2.19) 

as P | P c . (Here the condition p + 7 > is automatically satisfied, as a result of the 
rigorous theorem 7 > 1 [11].) 

An unusual two-SAW ensemble is employed in the join-and-cut algorithm, as will 
be discussed in Section 6.6.2. 



3 Monte Carlo Methods: A Review 

Monte Carlo methods can be classified as static, quasi-static or dynamic. Static 
methods are those that generate a sequence of statistically independent samples from 
the desired probability distribution tt. Quasi-static methods are those that generate 
a sequence of statistically independent batches of samples from the desired probabil- 
ity distribution 7r; the correlations within a batch are often difficult to describe. 
Dynamic methods are those that generate a sequence of correlated samples from 
some stochastic process (usually a Markov process) having the desired probability 
distribution tt as its unique equilibrium distribution. 

In this section we review briefly the principles of both static and dynamic Monte 
Carlo methods, with emphasis on the issues that determine the statistical efficiency 
of an algorithm. 

16 If < p + a s ing < 1, then (N) ~ (1 — /J^) - ^"""!;) as (3 ] (3 C , with logarithmic corrections when 
P+ o; s i„g = 0, 1. If p+ a s ing < 0, then (N) remains bounded as [3 ] f3 c . 
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3.1 Static Monte Carlo Methods 



Consider a system with state space (configuration space) S; for notational 
simplicity, let us assume that S is discrete (i.e. finite or countably infinite). Now 
let 7r = {ir x } xe s be a probability distribution on S, and let A = {A(x)} xe s be a 
real- valued observable. Our goal is to devise a Monte Carlo algorithm for estimating 
the expectation value 

(A) v = ]T ^ A(x) . (3.1) 
xes 

The most straightforward approach (standard Monte Carlo) is to generate 
independent random samples Xi, . . . } X n from the distribution tt (if one can!), and 
use the sample mean 

i n 

A = r £A(X 8 ) (3.2) 



n ■ -. 

8 = 1 



as an estimate of (A) v . This estimate is unbiased, i.e. 

(A) = (A)ir . (3.3) 

Its variance is 

var(A) = {A 2 } - (A) 2 

= -var jr (A) 
n 

= I ~ (A)l] . (3.4) 

However, it is also legitimate to generate samples Xi, . . . , X n from any probability 
distribution v, and then use weights W(x) = tt t J v x . There are two reasons one might 
want to sample from v rather than tt. Firstly, it might be unfeasible to generate 
(efficiently) random samples from 7r, so one may be obliged to sample instead from 
some simpler distribution v. This situation is the typical one in statistical mechanics. 
Secondly, one might aspire to improve the efficiency (i.e. reduce the variance) by 
sampling from a cleverly chosen distribution v. 

There are two cases to consider, depending on how well one knows the function 
W(x): 

(a) VK^x) is known exactly. [Note that v x W(x) = 1 and '^xW(x)~ 1 = 1.] 

x(zS x(zS 

(b) W(x) is known except for an unknown multiplicative constant (normalization 
factor). This case is common in statistical mechanics: if tt x = Z~^ x e~^ H ^ and 
v x = Zp}e- p ' H ^ x \ then W(x) = (Z p , / Z^e'^-^ 11 ^ but we are unlikely to 
know the ratio of partition functions. 

In the first case, we can use as our estimator the weighted sample mean 

1 71 

AW = -J2W(X t )A(X t ). (3.5) 

8 = 1 
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This estimate is unbiased, since 

(A W > = {WA) V = (A) v . (3.6) 

Its variance is 

r(F») = - \((WAY)„ - (WA); 



var 



n 



= l\(WAX-(A)l] (3.7) 

This estimate can be either better or worse than standard Monte Carlo, depending 
on the choice of v. The optimal choice is the one that minimizes (WA 2 )^ subject to 
the constraint (W -1 )^ = 1, namely 

W = EOT ' <3 ' 8) 

xes 

or in other words v x = const X ^(x)!^. In particular, if A(x) > the resulting 
estimate has zero variance. But it is impractical: in order to know W(x) we must 
know the denominator in (3.8), which is the quantity we were trying to estimate in 
the first place! Nevertheless, this result offer some practical guidance: we should 
choose W{x)~ x to mimic as closely as possible, subject to the constraint that 

t^xW{x)~ 1 be calculable analytically (and equal to 1). 

xes 

In the second case, we have to use a ratio estimator 

n 

J2W(X t )A(X t ) 

E w(xa 

8 = 1 

here the unknown normalization factor in W cancels out. This estimate is slightly 
biased: using the small-fluctuations approximation 



y\ „ (y) 



^ cov(Y, Z) var(Z) 



(Y)(z) (zy< 



(3.10) 



Zl (Z) 
we obtain 

n 1 J \n z J 

Since the bias is of order 1/n, while the standard deviation (= square root of the 
variance) is of order 1/y/n, the bias is normally negligible compared to the statistical 
fluctuation. 17 The variance can also be computed by the small-fluctuations approxi- 



17 Note that 

\(WA) W -(W) W (A) W \ < {W(A-{A),) 2 ) 1 J 2 ({W), -I) 1 ' 2 

(with equality if and only if A = c\ + C'jW -1 ) by the Schwarz inequality with measure v applied to 
the functions W — 1 and W(A — (A) w ). Therefore, from (3.11) and (3.13) we have (to leading order 
in l/n) 

\hm S (A (W ' rat ^)\ < n- 1 / 2 var(i( W '™* i °)) 1 / 2 ((W% -I) 1 ' 2 . 
So the bias is <C the standard deviation unless (W) w is enormous. 



13 



mation 



,'Y\ (Y) 2 ( Y Z \ . 

Var( zJ = (Zp Var ((F)-(Z)J (3 ' 12a) 

= ^ " Jjj£™(Y,Z) + |||^var(Z) ; (3.12b) 



it is 



var(A(™°)) = -(W{A- {A)^f)_ + 0(3) . (3.13) 



-<y(A-(AK) 2 ) +0(- 

The optimal choice of z/ is the one that minimizes (VU(A — (A)^) 2 )^ subject to the 
constraint (W -1 )^ = 1, namely 

wo*)- 1 = J a[ u^! a )m i • ^ 

E k x \A(x) - (A) v \ 
xes 

Let us now try to interpret these formulae. First note that 

(W)„ - 1 = (W 2 ) v - (W)l = Yax„(W) > , (3.15) 

with equality only if v = tt. So (VU}^ — 1 measures, in a rough sense, the "mismatch" 
(or "distance") between v and tt. Now assume for simplicity that A is a bounded 
observable, i.e. < M for all x £ S. Then it is immediate from (3.7) and (3.13) 

that 

/If 2 

var(A^) < —IW)„ (3.16) 
n 

varM^™ )) < —(W)„ + o(lr) (3.17) 

So the variances cannot get large unless (VK)^ ^> 1, i.e. v is very distant from 7r; 
and in this case it is easy to see that the variances can get large. The moral is this: 
when the probability distribution actually simulated (y) differs considerably from the 
distribution of interest (7r), the variance of the Monte Carlo estimates can be vastly 
higher than one might expect naively for the given sample size n. Heuristically, this 
is because the states (configurations) that are "typical" for tt are "rare" for z/, so the 
useful sample size is much smaller than the total sample size. 

Here is a concrete example: Let S be the set of all iV-step walks (not necessarily 
self-avoiding) starting at the origin. Let tt be uniform measure on self-avoiding walks, 
i.e. 

^ _ | 1/cjv if 00 is self-avoiding ^ ^ 

1 otherwise 

Unfortunately, it is not easy to generate (efficiently) random samples from 7r (that is 
the subject of this article!). So let us instead generate ordinary random walks, i.e. 
random samples from 

v w = (2d)~ N for all u £ S , (3.19) 
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and then apply the weights W(u) = tt^/v^. Clearly we have 

m. = ^ = (— )" . ,3.2.) 

which grows exponentially for large N. Therefore, the efficiency of this algorithm 
deteriorates exponentially as N grows. 

See [63, Chapter 5] for some more sophisticated static Monte Carlo techniques. 
It would be interesting to know whether any of them can be applied usefully to the 
self-avoiding walk. 



3.2 Dynamic Monte Carlo Methods 

In this subsection we review briefly the principles of dynamic Monte Carlo meth- 
ods, and define some quantities (autocorrelation times) that will play an important 
role in the remainder of this article. 

The idea of dynamic Monte Carlo methods is to invent a stochastic process with 
state space S having tt as its unique equilibrium distribution. We then simulate this 
stochastic process, starting from an arbitrary initial configuration; once the system 
has reached equilibrium, we measure time averages, which converge (as the run time 
tends to infinity) to 7r-averages. In physical terms, we are inventing a stochastic 
time evolution for the given system. It must be emphasized, however, that this time 
evolution need not correspond to any real "physical" dynamics: rather, the dynamics 
is simply a numerical algorithm, and it is to be chosen, like all numerical algorithms, 
on the basis of its computational efficiency. 

In practice, the stochastic process is always taken to be a Markov process. We 
assume that the reader is familiar with the elementary theory of discrete-time Markov 
chains. 18 

For simplicity let us assume that the state space S is discrete (i.e. finite or count- 
ably infinite); this is the case in nearly all the applications considered in this arti- 
cle. Consider a Markov chain with state space S and transition probability matrix 
P = {p(x — > y)} = {p xy } satisfying the following two conditions: 

(A) For each pair x,y £ S, there exists an n > for which p^ a J > 0. Here p^ a J = 
(P n ) xy is the n-step transition probability from x to y. [This condition is called 
irreducibility (or ergodicity); it asserts that each state can eventually be 
reached from each other state.] 

(B) For each y £ S, 

^2 77 * Vxy = Ky (3.21) 

xes 

[This condition asserts that it is a stationary distribution (or equilibrium 
distribution) for the Markov chain P = {p xy }-] 

18 The books of Kemeny and Snell [64] and Iosifescu [65] are excellent references on the theory 
of Markov chains with finite state space. At a somewhat higher mathematical level, the books of 
Chung [66] and Nummelin [67] deal with the cases of countable and general state space, respectively. 
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In this case it can be shown [66] that tt is the unique stationary distribution for the 
Markov chain P = {p xy } } and that the occupation-time distribution over long time 
intervals converges (with probability 1) to 7r, irrespective of the initial state of the 
system. If, in addition, P is aperiodic [this means that for each pair x,y £ S, 
p^ a J > for all sufficiently large n], then the probability distribution at any single 
time in the far future also converges to 7r, irrespective of the initial state — that is, 
lim^oopj^ = ir y for all x. 

Thus, simulation of the Markov chain P provides a legitimate Monte Carlo method 
for estimating averages with respect to tt. However, since the successive states 
X ,Xi, ... of the Markov chain are in general highly correlated, the variance of esti- 
mates produced in this way may be much higher than in independent sampling. To 
make this precise, let A = {A(x)} xe s be a real-valued function defined on the state 
space S (i.e. a real-valued observable) that is square-integrable with respect to tt. 
Now consider the stationary Markov chain (i.e. start the system in the stationary 
distribution 7r, or equivalently, "thermalize" it for a very long time prior to observing 
the system). Then {A t } = {A(X t )} is a stationary stochastic process with mean 

fi A = {A t ) = ^2ir x A(x) (3.22) 

xes 

and unnormalized autocorrelation function 19 



C AA (t) = (A s As+t) ~ /4 (3-23) 
= £ A(x) - * x * y ] A(y) . 

x,y£S 

The normalized autocorrelation function is then 



PAA (t) = C AA (t)/C AA (0) . (3.24) 

Typically p AA (t) decays exponentially (~ e~'*'/ T ) for large t; we define the exponen- 
tial autocorrelation time 



t 

Texp,A = I™ SUp : — (3.25) 

t^oo ~ log \p AA (t)\ 



and 

r exp = sup r exP:A . (3.26) 

A 

Thus, T exp is the relaxation time of the slowest mode in the system. (If the state space 

' e X p 



is infinite, T exp might be +oo!) 20 



19 In the statistics literature, this is called the autocovariance function. 

20 An equivalent definition, which is useful for rigorous analysis, involves considering the spectrum 
of the transition probability matrix P considered as an operator on the Hilbert space I 2 (it). [I 2 (it) 
is the space of complex- valued functions on S that are square-integrable with respect to 7r: \\A\\ = 
(%2xss 7r a;| J 4-( ;c )| 2 ) 1 ^ 2 < 00 • The inner product is given by (A, B) = J2 xeS ir x A(x)* B(x).] It is not 
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On the other hand, for a given observable A we define the integrated autocor- 
relation time 



oo 

r in t,A = - Paa {t) (3.27) 

^ t=-oo 

oo 

L t=l 

[The factor of | is purely a matter of convention; it is inserted so that r,- ni|J 4 ~ T exP: A 
if pAA(t) ~ e~™ T with t ^> 1.] The integrated autocorrelation time controls the 
statistical error in Monte Carlo estimates of (A). More precisely, the sample mean 

i n 

A = -Y,A t (3.28) 



n t=i 



has variance 

1 



(A) = - £ C AA (r-s) (3.29) 



var,.., 

r,s=l 



1 ^ /_ It 



- E (i-t)^) ( 3 - 3 °) 

1 



n t=-( n -l) 



-(2r mM )C^(0) forn>r (3.31) 
ii 

Thus, the variance of A is a factor 2Ti ntj A larger than it would be if the {A t } were 
statistically independent. Stated differently, the number of "effectively independent 
samples" in a run of length n is roughly n/2Ti ntj A- 

In summary, the autocorrelation times T exp and r,- ni|J 4 play different roles in Monte 
Carlo simulations. T exp controls the relaxation of the slowest mode in the system; in 
particular, it places an upper bound on the number of iterations ndi sc which should be 
discarded at the beginning of the run, before the system has attained equilibrium (e.g. 



hard to prove the following facts about P: 

(a) The operator P is a contraction. (In particular, its spectrum lies in the closed unit disk.) 

(b) 1 is a simple eigenvalue of P, as well as of its adjoint P* , with eigenvector equal to the 
constant function 1. 

(c) If the Markov chain is aperiodic, then 1 is the only eigenvalue of P (and of P*) on the unit 
circle. 

(d) Let R be the spectral radius of P acting on the orthogonal complement of the constant 
functions: 

R = inf jr: spec (P |\ l x ) C {A: |A| < r}} . 

Then R = e" 1 /^ _ 

Facts (a)-(c) are a generalized Perron- Frobenius theorem [68]; fact (d) is a consequence of a gener- 
alized spectral radius formula [69]. Note that the worst-case rate of convergence to equilibrium from 
an initial nonequilibrium distribution is controlled by R, and hence by r exp . 
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ridisc ~ 20r e:E p is usually more than adequate). On the other hand, r,- ni|J 4 determines 
the statistical errors in Monte Carlo estimates of (A), once equilibrium has been 
attained. 

Most commonly it is assumed that T exp and r,- ni|J 4 are of the same order of magni- 
tude, at least for "reasonable" observables A. But this is not true in general. In fact, 
one usually expects the autocorrelation function p A A(t) to obey a dynamic scaling 
law [70] of the form 

p AA {t\P) ~ \t\- a F((f3-f3 c )\t\ b ) (3.32) 

valid in the limit 

ft - ft c -> 0, |*| -> oo, x = (ft- ft c ) \t\ b fixed. (3.33) 

Here a, b > are dynamic critical exponents and F is a suitable scaling function; ft 
is some "temperature-like" parameter, and ft c is the critical point. Now suppose that 
F is continuous and strictly positive, with F(x) decaying rapidly (e.g. exponentially) 
as | a; | — > oo. Then it is not hard to see that 

r exp , A ~ \ft-ftc\- 1/b (3.34) 
T in t,A ~ \ft - ft c \- (1 - a)lb (ifa<l) (3.35) 
p AA (t;ft = ft c ) ~ |t|" a (3.36) 

so that T exPt A and r,- ni|J 4 have different critical exponents unless a = 0. 21 Actually, 
this should not be surprising: replacing "time" by "space", we see that T exPt A is 
the analogue of a correlation length, while r,- ni|J 4 is the analogue of a susceptibility; 
and (3.34)-(3.36) are the analogue of the well-known scaling law 7 = (2 — i])v — 
clearly 7 / j/ in general! So it is crucial to distinguish between the two types of 
autocorrelation time. 

Returning to the general theory, we note that one convenient way of satisfying the 
stationarity condition (B) is to satisfy the following stronger condition: 

(B') For each pair x,y £ S, ir x p xy = Tr y p yx . (3.37) 

[Summing (B') over x, we recover (B).] (B') is called the detailed-balance con- 
dition; a Markov chain satisfying (B') is called reversible. 22 (B') is equivalent to 
the self-adjointness of P as on operator on the space / 2 (vr). In this case, it follows 
from the spectral theorem that the autocorrelation function C^(t) has a spectral 
representation 

1 

C AA (t) = j \Md<r AA (\) (3.38) 



21 Our discussion of this topic in [71] is incorrect. A correct discussion can be found in [72]. 

22 For the physical significance of this term, see Kemeny and Snell [64, section 5.3] or Iosifescu [65, 
section 4.5]. 
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with a nonnegative spectral weight Jcr^(A) supported on the interval 
[-e" 1/Te ^> A , e~ 1,Tex P' A }. It follows that 

1 /1 + e-i/^.AX 1 A + e" 1 /^ 
TmM ~ 2 \1 - e- x l T -v,A J - 2 \1 - e" 1 /^ 

There is no particular advantage to algorithms satisfying detailed balance (rather 
than merely satisfying stationarity), but they are easier to analyze mathematically. 

Finally, let us make a remark about transition probabilities P that are "built up 
out of" other transition probabilities Pi, P 2 , . . . , P„: 

a) If Pi,P 2 ,...,P„ satisfy the stationarity condition (resp. the detailed-balance 
condition) for 7r, then so does any convex combination P = Ya=\ ^iPi- Here 
A; > and £"=1 A,- = 1. 

b) If Pi, P 2 , • • • , Pn satisfy the stationarity condition for 7r, then so does the product 
P = P\Pi ■ ■ ■ P n . (Note, however, that P does not in general satisfy the detailed- 
balance condition, even if the individual Pi do. 23 ) 

Algorithmically, the convex combination amounts to choosing randomly , with prob- 
abilities {A 8 }, from among the "elementary operations" P;. (It is crucial here that 
the A; are constants, independent of the current configuration of the system; only in 
this case does P leave tt stationary in general.) Similarly, the product corresponds to 
performing sequentially the operations Pi, P 2 , . . . , P n . 



(3.39) 



4 Static Monte Carlo Methods for the SAW 
4.1 Simple Sampling and Its Variants 

The most obvious static technique for generating a random iV-step SAW is simple 
sampling: just generate a random A^-step ordinary random walk (ORW), and reject 
it if it is not self-avoiding; keep trying until success. It is easy to see that this 
algorithm produces each A^-step SAW with equal probability. Of course, to save time 
we should check the self-avoidance as we go along, and reject the walk as soon as 
a self-intersection is detected. (Methods for testing self-avoidance are discussed in 
Section 7.1.2.) The algorithm is thus: 

23 Recall that if A and B are self-adjoint operators, then AB is self-adjoint if and only if A and B 
commute. 
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title Simple sampling, 
function ssamp(iV) 

comment This routine returns a random iV-step SAW. 
UJ <— 

start: for i = 1 to N do 

LOi <— a random nearest neighbor of LOi-i 

if LOi £ {cu , . . . goto start 

enddo 
return to 

(Here and in what follows, we will express algorithms in "pseudocode". Translation 
to your favorite language — Fortran, C or whatever — is almost always trivial.) 

The trouble with this algorithm is, of course, the exponentially rapid sample 
attrition for long walks. Clearly, the probability of an iV-step walk being self-avoiding 
is cjv / (2d) N , which behaves for large N as 

( / i/2d) N N' 1 - 1 (4.1a) 

e -XN N1 -l ( 41b ) 

log(2<*///) (4.2) 

is called the attrition constant. Therefore, the mean number of attempts required 
to generate an iV-step SAW is (2d) N / Cjv , which grows roughly as e XN . And the mean 
CPU time per attempt is of order min(l/A, N). So this method is extremely ineffi- 
cient in generating SAWs of length N J> 10/ A. For the simple (hyper-)cubic lattices in 
dimensions 2, 3 and 4, the values of A are approximately 0.42, 0.25 and 0.17, respec- 
tively (see Table 1). So it is unfeasible to generate SAWs of length more than 20-60 
steps by simple sampling. All alternative SAW Monte Carlo techniques are aimed es- 
sentially at alleviating this attrition problem — hopefully without introducing other 
problems of equal or greater severity! 

Some improvement can be obtained by modifying the walk-generation process so 
as to produce only walks without immediate reversals (such walks are called non- 
reversal random walks (NRRWs) or memory-2 walks). The algorithm is thus: 

title Non-reversal simple sampling, 
function nrssamp(A^) 

comment This routine returns a random A^-step SAW. 
UJ <— 

u\ <— a random nearest neighbor of 
start: for i = 2 to N do 

LOi <— a random nearest neighbor of not equal to cu 8 _ 2 

if LOi £ {cu , . . . goto start 

enddo 
return to 



c N 



(2d) 



N 



where 
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d 


// 


A 


A' 


2 


2.638158 5 (10) [73, 27] 


0.416 


0.129 


3 


4.683 907(22) [74] 


0.248 


0.065 


4 


6.772 0(5) [75] 


0.167 


0.033 


5 


8.838 6(8) [76] 


0.123 


0.018 


6 


10.878 8(9) [76] 


0.098 


0.011 


d — > oo 


2d-l- (2d)- 1 - ... [77, 78, 79, 50, 51] 


(2d)- 1 + ... 


(2d)~ 2 + ... 



Table 1: Connective constant fi and attrition constants A and A' for simple 
(hyper-)cubic lattices in dimensions 2 < d < 6 and d — > oo. Estimated errors in 
the last digit(s) are shown in parentheses. 



Then (2d) N is replaced by 2d(2d — 1) N 1 , and the attrition rate is 

A' = log . (4.3) 

H 

For comparison, A' is approximately 0.13, 0.07 and 0.03, respectively, for d = 2,3,4. 
This is much smaller than in the unmodified scheme, but the exponential attrition is 
still prohibitive for walks of length more than 80-300 steps. 

The logical next step is to modify the walk-generation process so that walks with 
loops of length < r are automatically absent. Let us start by building the walk 
out of strides of r steps [80]. 24 That is, let us enumerate in advance all the r-step 
SAWs — call them u/ 1 ), . . . ,u^ Cr \ (Obviously this takes a memory of order rc r , and 
so is feasible only if r is not too large.) We then build up the walk by repeated 
concatenation of strides. For simplicity let us assume that N is a multiple of r: 

title Simple r-step stride sampling, 
function simstride(r, k) 

comment This routine returns a random &r-step SAW. 

start: u <— {0} (zero-step SAW at the origin) 
for i = 1 to k do 

a <— a random integer from the set {1, . . . , c r } 

u <— u o u/ a ) (concatenation) 

if to is not self-avoiding goto start 
enddo 
return to 



24 The treatment in the remainder of this section relies heavily on [11, Section 9.3.1], which is in 
turn an explication of [63, p. 129]. 
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d 


r 




rc r 


AM = log(cJ/7//) 


a t( r \ i / / Or} 1 \ 1 1 V i \ 

A'W = log ((2^i Cr) i/ 7/U ) 


2 


10 


44100 


441000 


0.099 


0.071 


3 


7 


81390 


569730 


0.071 


0.045 


4 


6 


127160 


762960 


0.046 


0.024 


5 


5 


64250 


321250 


0.035 


0.014 


6 


5 


173172 


865860 


0.026 


0.008 



Table 2: Attrition constants A^ and A'^ for simple and non-reversal r-stride sam- 
pling. For each d } we have taken the largest r such that rc r < 10 6 . 




The probability of surviving to length N = kr is 

N 

(4.4) 

There is still exponential attrition (since c r > //), but this attrition can in principle 
be made arbitrarily small by taking r large (since lim^oo c 1 J' r = p). In practice we 
can probably handle rc r of order 10 6 on a modern-day workstation. 25 The resulting 
attrition rates A^ are shown in Table 2. They are far from spectacular; the trouble 
is that c 1 J' r converges rather slowly to /i. 26 

Of course, we can do better by choosing from among only those r-step walks 
whose first step is not opposite to the last step of the current u. In this non-reversal 
r-step stride sampling, the probability of surviving to length N = kr is 



/ 

Cjv 



2d-l_ c xk - 



2d C 7 \l 2d 



2d- 



~-c> 



N 

(4.5) 



The resulting attrition rates A'^ are shown in the last column of Table 2. 

Neither of these algorithms in fact eliminates all loops of length < r, because such 
loops can be formed by the concatenation of two r-step strides. But we can eliminate 
such loops if we are willing to pre-compute the list of legal pairs of strides. That is, 
for each index a(l<a<c r ), we make a list L a containing those indices f3 such 
that to^ a ) o u)W is self- avoiding. (This takes a memory of order c 2 .) Now, it would 
not be correct to choose at each stage of the algorithm a random SAW from the 
appropriate list L a \ the trouble is that the lists do not all have the same number of 



25 0f course, we can reduce the memory requirements by at least a factor 2c? by exploiting sym- 
metry, e.g. storing only those r-step SAWs whose first step is in some particular direction. But this 



only increases the feasible r by about 1. 

26 From (2.1) we have c r /r « fi [l + il^fll + O(I) 
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elements, and as a result the walks would not be generated with uniform probability 
(see also Section 4.2). Instead, we must allow the possibility of "rejections". Let 
c* = max \L a \ be the number of elements in the largest list. We can then perform: 

l<a<c r 

title Super-duper r-step stride sampling, 
function supstride(r, k) 

comment This routine returns a random kr-step SAW. 

oc\ <— a random integer from the set {1, . . . , c r } 
start: u <— u^" 1 ^ 

for i = 2 to k do 

m <— a random integer from the set {1, . . . , c*} 
if m > \L cti _ 1 \ goto start (this is the "rejection") 
ai <— the m th element from L cti _ 1 

U <— U O 

if to is not self-avoiding goto start 
enddo 
return to 

Clearly the probability of surviving to length N = kr is 

c r (c*y-i ~ {( K y/ 

Little is known about c*, but it is certainly < ^f^"C r , and probably not much less. In 
practice, the extravagant memory requirements of this method limit r to very small 
values; for a given amount of memory, non-reversal stride sampling probably works 
better. 27 




4.2 Inversely Restricted Sampling (Rosenbluth-Rosenbluth 
Algorithm) 

The exponential attrition of simple sampling and its variants arises from the fact 
that each new step of the walk might lead to a self-intersection. So it is tempting to 
envisage an algorithm in which one chooses randomly (with equal probability) from 
among only those next steps which do not lead to a self-intersection (assuming such 
steps exist). Unfortunately, this means that SAWs are not generated with uniform 
probability; rather, the probability that this algorithm generates a given A^-step SAW 
to is 

N 1 

P(lo) = const x []-, (4.7) 



27 Actually, it suffices to store the list lengths \L a \, and not the lists themselves. One can then 
choose a j by repeatedly trying to find a stride compatible with lu^"'- 1 ^, together with a probability 
1 — \L at _ 1 \/c* of giving up ("rejection") at each try. 
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where ki = ki(io 0} . . . , cu 8 _i) is the number of choices available at step i. 28 Therefore, 
each walk must be assigned a weight W(u) ~ 1/P(cu), and the mean value of an 
observable 0{u) must be estimated from a sample of walks by a ratio 

of weighted averages: 

(0(u)) « . (4.8) 

8 = 1 

This method is known as inversely restricted sampling [81, 82]: 

title Inversely restricted sampling, 
function irsamp(iV) 

comment This routine returns an iV-step SAW and its weight factor. 
UJ <— 

start: weight <- l/[2d(2d - l^' 1 ] (th is is merely a convenient normalization) 
for i = 1 to N do 

Si <— set of all nearest neighbors of LOi-i not contained in {cu , . . . 

if Si = goto start (the walk is "trapped") 

LOi <— a random element of Si 

weight <— weight X 
enddo 

return (cu, weight) 

This method has several difficulties: Firstly, there is still exponential sample at- 
trition for long walks (although at a much slower rate than in simple sampling): it is 
caused now not by mere self-intersection, but by "trapping". This is most serious in 
d = 2 [83, 84, 85, 86]. Secondly, a ratio estimator (4.8) is slightly biased; however, as 
discussed in Section 3.1, this difficulty is negligible for large sample size n [87]. The 
most serious difficulty is that "the weights are almost certain to get out of hand, a few 
of them being very much larger than all the rest. This means that the greater part 
of the data, corresponding to the negligible weights, gets ignored" [63, p. 131]. Thus, 
the variance of the estimates will be vastly higher than one might expect naively for 
the given sample size n. This is in fact a general problem in any Monte Carlo work 
in which the probability distribution actually simulated differs considerably from the 
distribution of interest (see Section 3.1). In the case at hand, one expects that the 
discrepancy between the two distributions will grow exponentially as the chain length 
N gets large. This has been verified numerically by Batoulis and Kremer [86], who 
conclude that for large N inversely restricted sampling is inferior to non-reversal 
simple sampling. 

Fraser and Winnik [88] have proposed a generalization of inversely restricted sam- 
pling, based on strides with cleverly chosen probabilities. Meirovitch [89, 90, 91, 
92] has introduced a slightly different generalization (which he calls the "scanning 

28 Here "const" is in fact the reciprocal of the probability of surviving in this algorithm to N steps 
without getting "trapped" . 
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method"), in which the algorithm looks ahead more than one step. But neither of 
these methods appears to avoid the exponential explosion of weights, although they 
may reduce it. 

4.3 Dimerization 

The dimerization algorithm [93] is an implementation of the computer scien- 
tists' principle of "divide and conquer" [94, Section 1.3] [95]. To generate an iV-step 
self-avoiding walk, we generate two independent (iV/2)-step SAWs ("dimers") and 
attempt to concatenate them. If the result is self-avoiding, we are done; otherwise, 
we discard the two walks and start again from scratch. This procedure can now be 
repeated recursively: to generate each of the (iV/2)-step SAWs, we generate a pair of 
(7V/4)-step SAWs and attempt to concatenate them, and so on. For N < some cutoff 
Ao, we stop the recursion and generate the SAWs by some primitive method, such 
as non-reversal simple sampling. The dimerization algorithm can thus be written 
recursively as follows: 

title Dimerization (recursive version), 
function dim(A') 

comment This routine returns a random A^-step SAW. 

if N < N then 

to <— nrssamp(A^) 
return to 

else 

N x <- [A72J (integer part) 

]V 2 <_ N - N t 

start: 

lo^ <- dim(TVi) 

Lo^ <- dim(7V 2 ) 

u <— Cl/ 1 ) o cl/ 2 ) (concatenation) 
if to is not self-avoiding goto start 
return to 

endif 

It is easy to prove inductively that algorithm dim produces each A^-step SAW with 
equal probability, using the fact that the subroutine nrssamp does so. It is crucial 
here that after a failure we discard both walks and start again from scratch. 

A non-recursive description of this same algorithm is given by Suzuki [93]. 

Let us analyze [96] the efficiency of the dimerization algorithm under the scaling 
hypothesis 

c N w Afj N N 1 ' 1 (4.9) 

[cf. (2.1)]. Let Tjv be the mean CPU time needed to generate an A^-step SAW by 
algorithm dim. Now, the probability that the concatenation of two random (A^/2)-step 
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SAWs yields an iV-step SAW is 



PN = « B-'N-^ , (4.10) 

( C iV/2) 2 

where B = A/4 7-1 . We will need to generate, on average, 1/pn pairs of (A^/2)-step 
SAWs in order to get a single A^-step SAW; hence 

T N « BN 1 ~ 1 2T N/2 . (4.11) 

(We have neglected here the time needed for checking the intersections of the two 
dimers; this time is linear in N, which, as will be seen shortly, is negligible compared 
to the time 2Tjv/2 for generating the two dimers.) Iterating this k times, where 
k = log 2 (N / N ) is the number of levels, we obtain 



2(7-l)fc(fc-l)/2 - J ' no 



C < oiv c 1 iog 2 jv + c 2 ^ (4.12a) 



where 



Ci = ^ (4.13a) 

C 2 = ^_± + log 2 (2B) = ^l + l og2 A (4.13b) 

and Co depends on No- Thus, the growth of Tjy is slower than exponential in A^; but 
if 7 > 1 (which occurs for d < 4) it is faster than any polynomial in N. Fortunately, 
however, the constants C\ and C 2 are very small: see Table 3. For d = 2 (resp. d = 3) 
this means that in practice Tjy behaves like jV~ 2-3 up to A^ of order several thousand 
(resp. several million). In d = 4 the behavior is presumably ~A^ Cl 1 °s lo s JV + c '2 _ p or 
d > 5 we have C\ = and C 2 only very slightly larger than 1, so dimerization is 
extraordinarily efficient: a random A^-step SAW can be produced in a CPU time that 
grows only slightly faster than linearly in N. 

The efficiency can be improved further by choosing the first step of to avoid 
reversing the last step of . This effectively replaces A by [(2d— l)/2d] A in (4.13b), 
and hence C 2 by C' 2 = C 2 — \og 2 [2d / (2d — 1)]. See Table 3 for the effect of this change. 

It is an open question whether for d < 4 there exists any static Monte Carlo 
algorithm for generating a random A^-step SAW (with exactly uniform distribution) 
in a mean CPU time that is bounded by a polynomial in N. 

For discussion of some statistical issues related to dimerization, see [11, Section 
9.3.2]. 

Remark. A slightly different version of the dimerization algorithm was invented 
independently by Alexandrowicz [97, 98] and used subsequently by many others. In 
this version, one begins by producing a large initial batch of Af-step SAWs, where M 
is some small number (ss Ao); one then joins randomly chosen pairs to form walks of 
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d 


fi (est.) 


7 (est.) 


A (est.) 


c\ 


c 2 


C 2 


2 


2.6381585 


43/32 


1.1775 


11/64 « 0.172 


0.72 


0.31 


3 


4.6839066 


1.162 


1.1845 


0.081 


1.00 


0.74 


4 


6.7720 


1 (xlog 1 / 4 ) 










5 


8.83861 


1 


1.25 





1.32 


1.17 


6 


10.87879 


1 


1.16 





1.21 


1.09 


all d > 5 




1 


< 1.493 





< 1.58 


< 1.58 



Table 3: Efficiency of dimerization algorithm as a function of lattice dimension d. 
Estimates are obtained by extrapolation of the available counts cjv [11, Tables C.l 
and C.4]. The last line is a rigorous bound valid for all d > 5 [11, p. 172]. The 
case d = 4 is somewhat anomalous, as it is believed [14] that cjv ~ fJ N (log iV) 1 / 4 in 
contrast to the usual power-law behavior (2.1). 



length 2M , and so forth. The trouble with this method is that the same subchains of 
length Af, 2Af, 4Af, etc. tend to occur repeatedly in the SAWs produced. Thus, the 
sample of SAWs produced is both slightly biased (SAWs with two or more identical 
subchains of length M are favored) and somewhat correlated, but it is difficult to 
assess these effects quantitatively. Both effects can be reduced by using an extremely 
large initial batch of SAWs, and by making not too many dimerization attempts 
per batch, but this is likely to be inefficient or unreliable or both. In my opinion 
Alexandrowicz' version of dimerization should not be used; algorithm dim is simpler, 
more efficient, and — above all — is correct. 

5 Quasi-Static Monte Carlo Methods for the SAW 
5.1 Quasi-Static Simple Sampling 

In each of the foregoing static methods, a slight improvement in efficiency can 
be obtained by working with several different values of N at once, and noticing that 
"a walk that intersects itself for the first time at the M th step provides [unbiased] 
instances of A^-step self-avoiding walks for all N < Af" [63, p. 129]. The resulting 
method is quasi-static in our classification: each pass through the algorithm produces 
a batch of SAWs (of various lengths) which are highly correlated among themselves, 
although successive batches are independent of each other. Unfortunately — and this 
seems to be characteristic of quasi-static methods — it appears difficult to estimate 
quantitatively the degree of correlation between the various SAWs in a given batch, 
and therefore difficult to estimate the actual statistical efficiency of the method. 
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5.2 Enrichment 



One method of generating long SAWs with much less attrition than (non-reversal) 
simple sampling is the enrichment technique [99]: if a walk survives to s steps, 
then several (t) copies are made of it and each copy is used independently as a starting 
point for further attempts to add steps; and likewise at 2s, 3s, etc. (The same idea 
was used earlier in Monte Carlo work on neutron-transport problems [63, Section 
8.2].) The free parameters s and t must be chosen judiciously (see below). This 
method is quasi-static: the SAWs produced in a single pass through the algorithm 
(i.e. the progeny of a single s-step SAW) are manifestly correlated, since they all 
have the same initial s steps and many of them have the same initial 2s, 3s, . . . steps 
as well. However, as before, it is difficult to assess this correlation quantitatively. 
(Indeed, the quasi-static simple sampling method can be considered to be the special 
case of the enrichment method with s = t = 1.) 

The enrichment algorithm can be analyzed semi-quantitatively as follows [100] 
[11, Section 9.3.3]: Let M ns (a random variable) be the number of ns-step walks that 
are produced in a single pass through the algorithm; by definition of "single pass" 
we have M s = 1. (The data produced by successive passes through the algorithm 
are obviously independent.) A SAW of length ns gives rise to t copies, each of which 
survives to length (n + l)s with (average) probability 29 

c (n+1)s f fi y 

an ~ (2d-iy Cns ~ \2d-1) ' ( j 

So we can regard M s , M 2s , ■ ■ ■ as a branching process [102] in which M ns is the 
number of "individuals" alive in the n th generation. We assume that each individual 
reproduces independently, producing a number of "children" which is a binomial 
random variable with parameters t and a = [fi/(2d — l)] s . 30 

It is easy to see that (M ns ) = (ta) n . Thus, if ta < 1 there is exponential sample 
attrition, just as in simple sampling. If ta > 1, some (although not all) starts lead 
to an exponential explosion of progeny; this is undesirable, as great computational 
effort will be expended in producing these samples, but the information contained in 
them is less than proportional to that effort, because they are highly correlated. The 
most interesting case is ta = 1: every start dies out eventually, but the mean lifetime 
of a start is infinite. More precisely, it can be shown [102] that 

Prob(M ns = 0) « 1-4- (5.2) 

Prob(M ns > (3n) « -H-er 2 ^ 2 (^ > q) (5.3) 

(7 2 n 



29 For large n, one has [from (2.1)] a n m ( 2( ^_ 1 ) s (l + ^) 7 — ► ( 2 d-i ) S as n ~^ °°> irrespective of 
the value of s or of 7. For even s this convergence is actually a rigorous theorem [101]. 

30 This assumption is not really correct: the trouble is that some SAWs will have higher or lower 
"fertility" than others (i.e. be harder or easier to intersect with); and these fertilities are somewhat 
correlated between different walks in the process, as all these walks share some common segments 
(the degree of correlation obviously depends on the relative location of the two SAWs in the "family 
tree"). Nevertheless, this assumption seems to be a reasonable approximation. 
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where a 2 = (1 — a)t = t — 1. Thus, M ns is nonzero only with probability of order 
1/n; but when it is nonzero it is typically of order n. 31 On the other hand, it is not 
hard to see that the mean CPU time per start is (when ta = 1) of order n times 
the CPU time needed to produce a single s-step segment. 32 So even if we make the 
over-pessimistic assumption that all the walks in the n th generation of a given start 
are perfectly correlated (and thus carry only as much information as one walk), it 
follows that we can obtain one independent ns-step walk in a CPU time of order n 2 
(i.e. ~n starts each taking a CPU time ~n). This ~iV 2 behavior is exceptionally 
good: it is better than all known static algorithms (e.g. dimerization) in dimension 
d < 4, and it is comparable to most of the new dynamic algorithms (see Section 6). 
The enrichment algorithm definitely deserves a systematic theoretical and empirical 
study. 

Remarks. 1. The integer t can also be a random variable; in this case the role of 
ta is played by (t)a. This generalization is useful in permitting fine-tuning of (t)a. 

2. The ~ iV 2 obtained here may really be ~ iV 7+1 if one takes account of the 
corrections to the sign in (5.1). See also Sections 5.3 and 6.6.1. 

5.3 Incomplete Enumeration (Redner- Reynolds Algorithm) 

Incomplete enumeration [104, 105] is a quasi-static algorithm that generates 
a batch of SAWs from the variable-iV, variable- £ ensemble (2.17) with p = 0. The 
idea is to take a standard algorithm for systematically enumerating all self-avoiding 
walks up to some length N maX} and modify it so that it pursues each branch of the 
"SAW tree" only with some probability f3 < 1: 

31 This prediction is strikingly confirmed by Grishman's [103] empirical observations for ta Pi 1 
and large but fixed n: "if the enrichment parameters are adjusted to make the total number of 
generated walks approximately equal to the number of starts, one will find that most of the starts 
'die out' before generating any walks, while a few starts each lead to a large number of walks." 

32 Here I assume that one tests self-avoidance using a method that takes a CPU time of order 1 
per added step (see Section 7.1.2). 
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title Incomplete enumeration (recursive version), 
subroutine incenum(o;, (3, N max ) 

comment This routine performs an incomplete enumeration of the SAW tree 
beginning at u, with probability parameter /3, up to maximum length N max 

append u to output list 
if \uj\ > N max return 
for i = 1 to 2d do 

U <— random number uniformly distributed on [0,1] 

iiU < (3 then 

u' <— one-step extension of u in direction i 
if u' is self-avoiding incenum(u/, (3, N max ) 

endif 
enddo 
return 

To perform an incomplete enumeration of the SAW tree beginning at the "root", one 
simply invokes incenum({0}, (3, N max ) } where {0} is the zero-step walk at the origin. 
Typically one chooses (3 < (3 C = 1///; in this case it is safe to set N max = oo. (If one 
sets N max = oo when (3 > (3 C} then with nonzero probability the algorithm will run 
forever!) 

A non-recursive implementation of incomplete enumeration can be found in [11, 
Section 9.3.3]. 

Incomplete enumeration is very closely related to enrichment: indeed, it is nearly 
identical to the enrichment algorithm in which 5 = 1 and t is a binomial random vari- 
able with parameters 2d and (3. The only difference is that the enrichment algorithm 
performs "sampling with replacement" among the various one-step extensions of u 
(i.e. the same u' could occur more than once), while incomplete enumeration per- 
forms "sampling without replacement" (each u' occurs at most once). So one expects 
incomplete enumeration to be slightly more efficient, in that the resulting batch of 
SAWs will be less correlated. (Indeed, there may even be some (mh'correlation arising 
from the sampling without replacement: this certainly occurs if, for example, (3 = 1. 
On the other hand, for ^ < 1 one still has strong positive correlation for the same 
reason as in the enrichment algorithm: many walks share the same initial segments.) 

The CPU time for one invocation of incomplete enumeration is obviously propor- 
tional to the total number of walks encountered during the enumeration. 33 On the 
average this is 

oo 

z{p) = £ p N c N ~ (i - ~ (Ny . (5.4) 

N=0 

On the other hand, it is reasonable to guess that, as in enrichment, 

Prob(at least one A^-step SAW is produced) ~ 1/N for N ~ (N) . (5.5) 
33 Here I assume that one tests self-avoidance using an 0(1) algorithm (see Section 7.1.2). 
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So if we make the over-pessimistic assumption that all the walks of length N in a 
given batch are perfectly correlated (and thus carry only as much information as one 
walk), it follows that we can obtain one independent iV-step walk in a CPU time of 
order iV 7+1 (i.e. choose f3 so that (N) ~ iV, then make ~iV starts each taking a CPU 
time ~iV 7 ). 

This differs from the estimate made previously for enrichment, because 7 is in gen- 
eral slightly larger than 1 (namely 43/32 in dimension d = 2, sal. 16 in d = 3, and 1 in 
d > 4). However, I suspect that the correct answer in both cases is iV 7+1 , and that this 
would be obtained also in the branching-process analysis if one were to take account of 
the n-dependence of ( £ ) s (l -f 1 ]. The incomplete-enumeration al- 

gorithm is also closely related to the slithering-tortoise algorithm discussed in Section 
6.6.1, and the same (iV) 2 vs. (iV} 7+1 issue arises there. 

In any case, the incomplete-enumeration algorithm is potentially quite efficient, 
and deserves a systematic theoretical and empirical study. 

6 Dynamic Monte Carlo Methods for the SAW 

In this section I discuss dynamic Monte Carlo methods for generating SAWs in 
various ensembles. I emphasize that the stochastic dynamics is merely a numerical 
algorithm, whose goal is to provide statistically independent samples from the desired 
distribution tt in the smallest CPU time possible. It need not correspond to any real 
"physical" dynamics! Indeed, the most efficient algorithms typically make non-local 
moves which would be impossible for real polymer molecules. 

6.1 General Considerations 

This subsection contains some exceedingly pedantic — but I hope useful — general 
considerations on dynamic Monte Carlo algorithms. 

The study of a Monte Carlo algorithm can be divided into three stages: 

• specification of the algorithm; 

• verification of the algorithm's validity; and 

• study (by mathematical, heuristic and/or numerical means) of the algorithm's 
efficiency. 

Let us consider these aspects in turn: 

To specify a dynamic Monte Carlo algorithm, we must specify three things: 

1) The state space S. 

2) The desired equilibrium measure tt. 

3) The transition probability matrix P = {p(io — > lo 1 )}. 



31 



Here S and tt specify the model that we wish to study, while P specifies the numerical 
method by which we propose to study it. 34 With regard to P, it is useful to subdivide 
the issue further, by specifying successively: 

3a) The set of allowed elementary moves, i.e. the transitions u — > u' for which 
p(u — >■ u') > 0. 

3b) The probabilities p(io — > u') for the allowed elementary moves. 

After specifying the algorithm, we must prove the algorithm's validity. As dis- 
cussed in Section 3.2, this has two parts: 

(A) We must prove ergodicity (irreducibility), i.e. we must prove that we can 
get from any state u £ S to any other state u' £ S by some finite sequence of 
allowed elementary moves. 

(B) We must prove the stationarity of 7r with respect to P. 

Ergodicity is a combinatorial problem; it depends only on the set of allowed elemen- 
tary moves, not on their precise probabilities. [This is the motivation for distinguish- 
ing sub-questions (3a) and (3b).] For many SAW algorithms, the proof of ergodicity 
is highly non-trivial. Indeed, in several embarrassing cases, an algorithm was used 
for many years before it was realized to be non-eigodic. As for stationarity, this is 
usually (though not always) verified by showing that P satisfies detailed balance for 
7r [cf. (3.37)] or is built up out of constituents Pi, . . . , P n which do so [as discussed at 
the end of Section 3.2]. 

Once the algorithm is known to be valid, we can investigate its computational effi- 
ciency , as measured by the amount of CPU time it takes to generate one "statistically 
independent" sample from the distribution tt. This study also has two parts: 

i) Investigate the autocorrelation times r,- ni|J 4 (for observables A of interest) 
and T exp . 

ii) Investigate the computational complexity of the algorithm, i.e. the mean 
CPU time per iteration (henceforth denoted Tcpu)- 

The CPU time per "statistically independent" sample is then 2Ti ntj ATcpUi and this 
provides a criterion for comparing algorithms: for given S and 7r, the best algorithm 
is the one with the smallest product Ti ni ^cpu ■ (This criterion may of course depend 
on the observable A.) Both aspects (i) and (ii) should be studied by all the methods 
at our disposal: rigorous mathematical analysis, heuristic physical reasoning, and 
numerical experimentation. Since we are primarily interested in long self-avoiding 
walks, we will emphasize the asymptotic behavior of r m(i i, r exp and Tcpu as N — > oo 
(or (N) — > oo for the variable- N algorithms). Usually these quantities behave as some 
power of N or (N), in which case we can identify a dynamic critical exponent. 

34 0f course, the choice of P trivially determines S; and if P is ergodic (as it should be if the 
algorithm is to be valid), then it also determines ir. But it is nevertheless useful conceptually to 
distinguish the three ingredients, which typically correspond to the chronological stages in inventing 
a new Monte Carlo algorithm. 
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Remarks. 1. We are here always considering the relaxation time of the auto- 
correlation functions in equilibrium , as it is this relaxation time which is of primary 
importance for determining the statistical accuracy of the Monte Carlo method (see 
Section 3.2). Some studies [106, 107, 108, 109] have focussed instead on the relax- 
ation to equilibrium from a highly non-equilibrium initial state; but as Kranbuehl 
and Verdier [110] have pointed out, this quantity may well have a strictly smaller 
dynamic critical exponent. An analogous situation arises in the Glauber dynamics 
for the Ising model [111]. 35 

2. In this article we always measure time in units of attempted elementary moves. 
Much of the literature on dynamic SAW models uses a time scale of attempted elemen- 
tary moves per bead; autocorrelation times expressed in this way should be multiplied 
by N (actually N + 1) before comparing them to the present paper. 

6.2 Classification of Moves 

The elementary moves in a SAW Monte Carlo algorithm can be classified according 
to whether they are 

• A^-conserving or A^-changing 

• endpoint-conserving or endpoint-changing 

• local or non-local 

Obviously, fixed- A" algorithms must use only A^-conserving moves, while variable- A" 
algorithms are free to use both A^-conserving and/or A^-changing moves (and indeed 
must use some of the latter in order to satisfy ergodicity). An analogous statement 
holds for fixed- £ and variable- £ algorithms with regard to endpoint-conserving and 
endpoint-changing moves. The distinction between local and non-local moves will be 
explained later. 

Within these limitations, the elementary moves to be discussed below can be 
mixed more or less freely to make "hybrid" algorithms. The art is, of course, to find 
a useful combination. (A cocktail can be made by mixing any set of liqueurs in any 
proportions, but there is no guarantee that the resulting concoction will taste good! 
In particular, it may or may not taste better than the individual ingredients taken 
separately.) Thus, a hybrid algorithm is useful when its performance is superior to 
that of either of its "pure" components, in the sense of having a smaller product 
T~int,ATcpu • Roughly speaking, this occurs when the "slow modes" under one type of 
move are speeded up by the other type of move. (An extreme case of this is when 
the combined algorithm is ergodic while both pure algorithms are non-ergodic.) 

35 In principle, the asymptotic behavior as t — ► oo of the relaxation from nonequilibrium is con- 
trolled by the same r exp as the autocorrelation in equilibrium. But to observe this asymptotic 
relaxation in practice would require enormous statistics (e.g. tens of thousands of repeat runs). 
Most of these studies have focussed instead on the initial relaxation — e.g. the time to relax to 
within 1/e [106, 109], the time(s) in a phenomenological fit [107, 108], or the area under the re- 
laxation curve [112, 113, 111] — and all of these are likely to exhibit a smaller dynamic critical 
exponent than either r exp or Ti ntt A- 
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Note that a hybrid algorithm always contains one or more free parameters — 
namely the relative probabilities of the different types of moves — which can be 
tuned to optimize the computational efficiency. (Often this requires a tradeoff between 
autocorrelation time and computational complexity, particularly if one move is local 
while the other is non-local.) This optimization is a non-trivial problem, and it must 
be approached systematically if the results are to be of any use. (The behavior of 
the algorithm at some arbitrarily chosen set of parameter values may be completely 
misleading.) A priori there are three possibilities: 

a) One of the pure algorithms is superior to any non-trivial hybrid. 

b) The optimal hybrid is better than either of the pure algorithms, but only by a 
bounded factor (e.g. 2 or fO or whatever). 

c) The optimal hybrid has a better dynamic critical exponent than either of the 
pure algorithms, so that its superiority factor grows without bound as N — > oo. 

The most interesting hybrid algorithms are of course those of type (c). But those of 
type (b) should not be sniffed at, if the gain is large enough (and the human effort 
required to find an optimal or nearly-optimal mixture is not too large). 

Most of the elementary moves to be discussed below have the property that the 
resulting walk u' is not guaranteed to be self-avoiding. Therefore, it is necessary to 
test whether u' is self-avoiding (see Section 7.1.2 for methods for doing this). If u' is 
self-avoiding, then the proposed move is accepted; otherwise, it is rejected, and the 
old walk is counted once again in the sample. This procedure can be understood as 
the Metropolis criterion for the energy function 



One final remark: Up to now we have adopted the convention that all walks start 
at the origin (lo = 0) unless specified otherwise. However, some of the moves to be 
discussed below may alter the initial site of the walk so that it is no longer at the 
origin. One could, of course, imagine that the resulting walk is then translated back 
to the origin. But a more convenient approach is to consider all translates of a given 
SAW to be equivalent; then we need not worry where the initial point is. 36 This is 
how the algorithms are most efficiently implemented in practice. (Every once in a 
while one should translate the walk back to the origin, just as a precaution against 
integer overflow.) 

6.3 Examples of Moves 

6.3.1 Local iV-Conserving Moves 

A local move is one that alters only a few consecutive sites ("beads") of the SAW, 
leaving the other sites unchanged. Otherwise put, a local move excises a small piece 

36 In mathematical language, we redefine Sn to be the set of equivalence classes of TV-step SAWs 
modulo translation. (And likewise for the other state spaces.) Note also that the observables (2.3)- 
(2.5) are unaffected by translation of the walk, hence they are well-defined on equivalence classes. 
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from the original SAW and splices in a new local conformation in its place. (Of course, 
it is always necessary to verify that the proposed new walk is indeed self- avoiding.) 
In this subsection we concentrate on iV-conserving local moves, i.e. those in which 
the excised and spliced-in subwalks have the same number of beads. 

More precisely, let u and u' be iV-step SAWs, and let k be an integer > 1. 
We say that to and to 1 are interconvertible by a &-bead move if there exists 
an index i m in (0 < imin < N — k -\- 1) such that Ui = cu' for all i except possibly 
i = imin, imin + 1, • • • , imin + k - 1. (We shall also assume that u imin ^ ^[ mm and 
LOi min+ k-i 7^ ^imin+k-n smce otherwise u and u' would be interconvertible also by 
some move of less than k beads.) If i m in = or iV - & + 1, we call this an end-group 
&-bead move; otherwise we call it an internal &-bead move. Clearly, internal moves 
are endpoint-conserving, while end-group moves are not. 

In Figure 1 we show all the possible one-bead moves (on a hypercubic lattice). 
Move A is a "one-bead flip" (also called "kink-jump"); it is the only one-bead internal 
move. Moves B and C are end-bond rotations. 

In Figure 2 we show all the possible internal two-bead moves. Move D is a "180° 
crankshaft". Move E is a "90° crankshaft"; of course it is possible only in dimension 
d > 3. Move F is a "two-bead L-flip". Move G permutes three successive mutually 
perpendicular steps (which lie along the edges of a cube); again this is possible only 
in dimension d > 3. 

We leave it to the reader to construct the list of two-bead end-group moves. There 
are numerous three-bead moves; a few interesting ones are shown in Figure 3. 

6.3.2 Bilocal A^-Conserving Moves 

A bilocal move is one that alters two disjoint small groups of consecutive sites 
(or steps) of the walk; these two groups may in general be very far from each other. 
One trivial way of making an A^-conserving bilocal move is to make two independent 
(nonoverlapping) A^-conserving local moves. Here we are interested in the nontrivial 
A^-conserving bilocal moves, i.e. those in which one subwalk loses sites and the other 
subwalk gains them. Instead of formalizing the concept, let us simply give some 
examples: 

• The slithering-snake (or reptation) move, which deletes a bond from one end 
of the walk and appends a new bond (in an arbitrary direction) at the other 
end [Figure 4]. 

• The kink transport move, which deletes a kink at one location along the walk 
and inserts a kink (in an arbitrary orientation) at another location [Figure 5]. 

• The kink-end reptation move, which deletes a kink at one location along the 
walk and appends two new bonds (in arbitrary directions) at one of the ends of 
the walk [Figure 6 >]. 

• The end-kink reptation move, which deletes two bonds from one of the ends 
of the walk and inserts a kink (in an arbitrary orientation) at some location 
along the walk [Figure 6< ]. 
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Figure 1: All one-bead local moves. (A) One-bead flip. (B) 90° end-bond rotation. 
(C) 180° end-bond rotation. 
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Figure 2: All internal two-bead local moves. (D) 180° crankshaft. (E) 90° crankshaft 
(d > 3 only). (F) Two-bead L-flip. (G) Cube permutation (d > 3 only). 



(H) 



— •- 
•- 



-• • 



(I) 



• •- 



Figure 3: Some selected internal three-bead local moves. (H) Three-bead J-flip. (I) 
Three-bead 180° crankshaft. 
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Figure 4: The slithering-snake (reptation) move. Head of the walk is indicated by X . 
Dashed lines indicate the proposed new step (resp. the just-abandoned old step). 
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Figure 5: The kink-transport move. A kink has been cleaved from AB and attached at 
CD. Note that the new kink is permitted to occupy one or both of the sites abandoned 
by the old kink. 




Figure 6: The kink-end reptation ( >) and end-kink reptation (< ) moves. In >, 

a kink has been cleaved from AB and two new steps have been attached at the end 
marked X . Note that the new end steps are permitted to occupy one or both of the 
sites abandoned by the kink. 
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Figure 7: The pivot move (here a +90° rotation). The pivot site is indicated with an 
X. Dashed lines indicate the proposed new segment. 



6.3.3 Non-Local iV-Conserving Moves 

Here we move definitively out of the realm of systematic classification and into the 
realm of ingenuity. The possibilities for non-local moves are almost endless, but it is 
very difficult to find one which is useful in a Monte Carlo algorithm. There are two 
reasons for this: Firstly, since a non-local move is very radical, the proposed new walk 
usually violates the self- avoidance constraint. (If you move a large number of beads 
around, it becomes very likely that somewhere along the walk two beads will collide.) 
It is therefore a non-trivial problem to invent a non-local move whose acceptance 
probability does not go to zero too rapidly as N — > oo. Secondly, a non-local move 
usually costs a CPU time of order N (or in any case N' p with p > 0), in contrast to 
order I for a local or bilocal move. It is non-trivial to find moves whose effects justify 
this expenditure (by reducing r,- ni|J 4 more than they increase Tcpu)- 

The following paragraphs are, therefore, nothing more than a brief listing of those 
non-local moves which, as of 1993, have been demonstrated to be useful for Monte 
Carlo algorithms in at least one of the SAW ensembles. Tomorrow someone could 
invent a new and even better non-local move; I hope that some reader of this essay 
will be moved (pardon the pun) to do so. This is a wide-open area of research. 

Only two broad types of useful non-local moves are known at present: pivot moves, 
and cut-and-paste moves. 

In a pivot move, we choose some site Uk along the walk as a pivot point, and 
apply some symmetry operation of the lattice (e.g. rotation or reflection) to the part 
of the walk subsequent to the pivot point, using the pivot point as the origin [Figure 
7]. That is, the proposed new walk is 



where g is the chosen symmetry operation. 

In a cut-and-paste move, we cut the walk into two or more pieces, invert and/or 
reflect and/or permute the pieces, and finally reassemble the pieces. For example, 




Ui for < i < k 

Uk + g(coi — Uk) for k + I < i < N 






(6.3) 
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Figure 8: Inversion of the subwalk uP'^. 
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Figure 9: Some local iV-changing moves. (J) Kink insertion (AiV = +2). (K) Kink 
deletion (AiV = -2). (L) End-bond addition (AiV = +1). (M) End-bond deletion 
(AiV = -1). 



More generally, one may apply to any lattice symmetry operation that leaves 

Uk and ui invariant; or any lattice symmetry operation that interchanges Uk and u>i, 
followed by the inversion (6.3). Such symmetry operations exist if u>i — Uk lies on 
certain axis or diagonal hyperplanes. 

Empirically, the pivot and cut-and-paste moves have a reasonable acceptance prob- 
ability even at large iV, of order ~iV~ p with p typically of order 0.1-0.4. The heuris- 
tic reason is that these moves conserve most of the chain structure (and hence its 
self- avoidance), except near the pivot point or cut point (s). As a result, the accep- 
tance probability for a move involving a single pivot or cut point is roughly similar 
to that encountered when concatenating two independent (iV/2)-step walks, namely 
c n/ c n/2 ~ iV~( 7-1 ). By the same reasoning, a move with n cut points might be 
expected to have an acceptance probability roughly like ~iV~ n ( 7_1 ). 

6.3.4 Local iV-Changing Moves 

Recall that a local move is one that excises a small piece from the original SAW 
and splices in a new local conformation in its place. An iV-changing local move has 
the freedom to splice in a piece with fewer or more sites than the original piece. 

The simplest internal local iV-changing moves are kink insertion and kink deletion 
(J and K in Figure 9); these have AiV = +2 and AiV = —2, respectively. The simplest 
end-group local iV-changing moves are end-bond addition and end-bond deletion (L 
and M in Figure 9); these have AiV = +1 and AiV = — 1, respectively. 

Note that each local iV-changing move is simply "one half" of some bilocal move. 
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Elementary Moves 


Autocorrelation Time r 


Scheme 


References 


(see Figs. 1-2) 


(in elementary moves) 


Verdier-Stockmayer 


[106, 115, 116] 


A,B 


~ N~ 3 + 2v (?) 


Modified 


[116, 109] 


A,B,C 


~ N~ 3 + 2v (?) 


Verdier-Stockmayer 








Heilmann II 


[107, 108, 117] 


A,B,E 


~ N~ 2 + 2v (?) 


Birshtein et al./ 


[118, 108] 


A,B,D 


~ N~ 2 + 2v (?) 


Heilmann-Rotne 3 








Taran-Stroganov 


[119] 


A,B,D,E 


~ N~ 2 + 2v (?) 


Verdier-Kranbuehl 


[120, 121] 


B,D,F 


~ N~ 2 + 2v (?) 


Kranbuehl-Verdier 


[110, 122, 123] 


A,B,D,F 


~ N~ 2 + 2v (?) 


Delbriick 


[114] 


most one- and two-bead 


~ N~ 2 + 2v (?) 


Meirovitch 


[124] 


most three-bead 


~ N~ 2 + 2v (?) 


Lai et al. 


[126, 127] 


all three-bead 


~ N~ 2 + 2v (?) 


Geny-Monnerie / 


[128, 129, 130, 131] 


some two- and three-bead 


~ N~ 2 + 2v (?) 


Kremer et al. 




(tetrahedral lattice) 





Table 4: Some local iV-conserving algorithms. 



6.4 Fixed-iV, Variable-x Algorithms 
6.4.1 Local Algorithms 

Historically the earliest dynamic Monte Carlo algorithms for the SAW were local 
iV-conserving algorithms: they date back to the work of Delbriick [114] and Verdier 
and Stockmayer [106], both published in 1962. During the subsequent two decades, 
numerous variants on this theme were proposed (see Table 4). Most of these algo- 
rithms employ some subset of moves A-F from Figures 1 and 2. 

Unfortunately, all local A^-conserving algorithms have a fatal flaw: they are non- 
ergodic. The nonergodicity of the Verdier-Stockmayer algorithm was noticed as early 
as 1968 [107, 115], but this fact did not seem to deter usage of the algorithm, or to 
provoke serious discussion about the ergodicity or nonergodicity of related algorithms 
(one exception is [108]). Finally, in 1987 Madras and Sokal [125] proved that all 
local A^-conserving algorithms for SAWs in dimensions d = 2, 3 are nonergodic for 
sufficiently large A^. 37 Furthermore, they proved that for large N, each ergodic class 
contains only an exponentially small fraction of the SAW configuration space. (These 
results are probably true also in dimension d > 4, but have not yet been proven.) 

This nonergodicity is in fact quite easy to see: Consider the double cul-de- 
sac configuration shown in Figure 10(a). This SAW is completely "frozen" under 
elementary moves A, B, D and F: it cannot transform itself into any other state, nor 
can it be reached from any other state. It follows that the original Verdier-Stockmayer 
algorithm [106, 115] and most of its generalizations [107, 108, 117, 118, 119, 120, 110, 



For algorithms based on moves of k or fewer beads, nonergodicity arises in dimension d = 2 for 
all N > 16fc + 63, and for quite a few smaller N as well [125, Theorem 1]. 
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(a) (b) (c) 

Figure 10: Some "double cul-de-sac" configurations, frozen in the Verdier-Stockmayer 
algorithm and its generalizations. 



122, 123] are nonergodic (in d = 2) already for N = 11. If move C is allowed, then 
the configuration of Figure 10(a) is no longer frozen, but that of Figure 10(b) still 
is. Thus, any algorithm based on moves A-D and F is nonergodic (in d = 2) for 
N = 15. If two-bead end-group moves are allowed, then the configuration of Figure 
10(b) is no longer frozen, but that of Figure 10(c) still is. Thus, any algorithm based 
on one-bead and two-bead moves is nonergodic (in d = 2) for N = 19. 

When three-bead moves are allowed, it is not sufficient simply to make the double 
cul-de-sac taller. Indeed, any double cul-de-sac of the kind shown in Figure 10, no 
matter how tall, can be unfolded into a straight rod by repeated use of the moves 
A, B and H. (The reader might find it amusing to work out the required sequence 
of moves.) But only one additional trick is needed: by folding the double cul-de-sac 
once more, as in Figure 11, a frozen configuration can be obtained for the k-he&d 
algorithm for arbitrary k. This is the idea behind the Madras- Sokal proof. 

The nonergodicity of the Verdier-Stockmayer algorithm due to double culs-de-sac 
was noticed already by Verdier [115] in 1969. 

An entirely different type of nonergodicity arises in dimension d = 3 (and only 
there) because of the possibility of knots, as was first pointed out by Heilmann [107] 
in 1968. The simplest knotted configuration 38 is shown in Figure 12: it has N = 18, 
and although it is not completely "frozen", it nevertheless cannot be deformed to a 
straight rod using moves A-F. It is likely that analogous knots can be constructed 
for the k-he&d algorithm for arbitrary k. 

For additional historical discussion, along with extensive discussion of the practical 
implications of nonergodicity, see [125, Sections 3 and 4]. 

Since the local A^-conserving algorithms are nonergodic, their autocorrelation 
times T exp and r,- ni|J 4 are by definition +oo: the simulation never converges to the 
correct ensemble average. Nevertheless, we can imagine starting the simulation in 
some particular configuration (e.g. a straight rod), and ask how long it takes to ex- 

38 This is not a knot in the true topological sense, since true knots can occur only for closed walks 
(ring polymers). We are therefore here using the word "knot" in a loose sense to describe the general 
shape of the chain. 
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Figure 11: A "folded double cul-de-sac" configuration, drawn here for a = 3. Such 
configurations are frozen under under all k-he&d local iV-conserving moves with k < a. 
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Figure 12: A "knot" that cannot be deformed to a straight rod using moves A-F. 
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plore adequately the ergodic class of that particular configuration. Here is a heuristic 
argument [71] giving a crude estimate of r for such a restricted simulation: 

Let us consider, as a typical quantity, the evolution of the squared radius of 
gyration of the chain. At each elementary move, a few beads of the chain move a 
distance of order 1; it follows from (2.4c) that the change in R 2 is of order N v ~ x . In 
order to traverse its equilibrium distribution, R 2 must change by something of order 
its standard deviation, which is ~N 2v . Assuming that R 2 executes a random walk, it 
takes about (N 2v /A^ -1 ) 2 ~ N 2+2v elementary moves for this to occur. So we predict 
r ~ ]V 2 + 2i/ . 39 This basic estimate ought to be applicable to the dynamics of ordinary 
random walks (ORWs, free-flight chains) and non-reversal random walks (NRRWs) 
as well as SAWs. 

This reasoning can easily be converted into a more-or-less rigorous proof of the 
lower bound T exp > Tint,R\ > const X N 2+2v (see [132, p. 51] for a slight variant). 
However, the true r could be significantly larger than this estimate if there are modes 
which relax essentially more slowly (i.e. with a larger dynamic critical exponent) than 
the radius of gyration. It also could be wrong if there are, in a particular model, 
special conservation laws or quasi-conservation laws which inhibit the relaxation of 
the radius of gyration; this indeed can occur (see the next paragraph). Finally, it 
is not clear whether the result r ~ N 2+2v is exact — and if so, for which r — 
or is merely a reasonable first approximation. Usually dynamic critical exponents 
are not expressible in terms of static ones, except for trivial (Gaussian-like) models. 
It is thus probable that r ~ N 2+2v is not exact for the SAW, although it may be 
exact for the NRRW; it is known to be exact for the ORW, at least in the Verdier- 
Stockmayer dynamics [133, 134, 135, 136, 137, 138, 139, 140]. The numerical evidence 
is inconclusive (see references in [71]). 

The Verdier-Stockmayer (pure one-bead) and Kranbuehl-Verdier (pure two-bead) 
algorithms for the SAW or NRRW (but not the ORW) have peculiar conservation laws 
which inhibit the relaxation of the chain, thereby increasing the relaxation time above 
the usual N 2+2v by at least an extra factor of N [116, 121]. 40 These conservation laws 
can be broken by mixing one-bead and two-bead moves. 

39 The same order of magnitude is obtained if one considers the evolution of the end-to-end distance 
vector, the vector from the center-of-mass to an endpoint, or any similar global quantity. 

40 Hilhorst, Deutch and Boots [116, 121] have given a very beautiful analysis of these dynamics, 
using a mapping onto a gas of one-dimensional random walkers interacting via exclusion. It would 
be interesting to reexamine these questions using mathematical results on this latter model [141, 
142, 143]; it might be possible to compute the exact asymptotic behavior, at least in the NRRW 
case. 
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Figure 13: A "double cul-de-sac" configuration which is frozen in the reptation algo- 
rithm. 



6.4.2 Bilocal Algorithms 

The slithering-snake (reptation) algorithm 41 was invented independently by 
Kron [144, 145] and Wall and Mandel [146, 147]. 42 The elementary moves of this 
algorithm are "slithering" motions: one step is appended at one end of the walk, and 
one step is simultaneously deleted from the other end (Figure 4). Such a move is 
iV-conserving and bilocal (but not local). 

Unfortunately, the reptation algorithm is nonergodic, as was observed already by 
the original inventors [145, 146]. In this algorithm, frozen configurations occur when 
both ends of the walk are trapped in culs-de-sac. The simplest such configuration is 
shown (for d = 2) in Figure 13, and has N = 14. 43 

As with the local A^-conserving algorithms, we can inquire about the autocorre- 
lation times of a simulation within a particular ergodic class. A plausible heuristic 
argument [147] suggests that r ~ A^ 2 . This is because the SAW transforms itself by 
random back-and-forth slithering along the chain; after ~A^ 2 moves, this slithering 

41 The "slithering-snake (reptation) algorithm" should not be confused with the "reptation conjec- 
ture" of de Gennes [5, Section VIII. 2], who argues that the most rapid modes of polymer diffusion in 
the true physical dynamics (in a melt consisting of many entangled polymer chains) are the slithering 
modes. 

42 There are at least three variants of the slithering-snake scheme. In the first version of Wall 
and Mandel [146, pp. 4592-4593], the "head" and "tail" of the chain are interchanged only when 
an attempted move is rejected; this satisfies the stationarity condition but not the detailed-balance 
condition. (In fact, one step of the algorithm followed by a head-tail interchange satisfies detailed 
balance [148]. This is analogous to the situation in the Hamilton/Langevin hybrid algorithm for 
spin models [37, p. 237, footnote 18].) In the second version of Wall and Mandel [146, top of 
p. 4594], adopted also by Webman et al. [149], the "head" and "tail" are chosen anew at each 
cycle, randomly with 50%-50% probability; this satisfies detailed balance. The original algorithm 
proposed by Kron [144] is much more complicated than either of the preceding versions; it appears 
to satisfy stationarity but not detailed balance. 

43 The superficial resemblance between Figures 10-11 and 13 is, however, very misleading: the 
nonergodicities in the two types of algorithm are of radically different natures. In the local TV- 
conserving algorithms, nonergodicity is caused by the occurrence of a frozen conformation anywhere 
along the walk. In the reptation algorithm, by contrast, nonergodicity occurs only if both of the 
ends of the walk are trapped in culs-de-sac. It turns out that the ergodic class of the straight rod 
in the reptation algorithm contains at least a fraction 7V _ (t -1 )/ 2 of the SAW configuration space, 
whereas in the local algorithms it contains only an exponentially small fraction. See [125, Section 
3] for further discussion. 
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will have carried it N steps, and thus all the original bonds of the chain will have 
disappeared and been replaced by random new ones. (Alternatively, R 2 changes by 
an amount of order N 2v ~ x per elementary move; thus, it takes ~iV 2 elementary moves 
for R 2 to traverse a distribution of width ~ N 2v '.) It is easy to see that this argu- 
ment is exact for ORWs and NRRWs; it is not clear whether it is exact or merely 
approximate for SAWs. 

The nonergodicity of the reptation algorithm can be cured by adjoining addi- 
tional bilocal moves to the algorithm [150]. Indeed, the following bilocal (or mixed 
local/bilocal) algorithms are known to be ergodic: 

• reptation + kink-end and end-kink reptation [150] 

• kink-end and end-kink reptation + 90° end-bond rotation [148] 

• reptation + kink transport + one-bead flip, at least in dimension d = 2 [151] 

• kink transport + one-bead flip + 90° and 180° end-bond rotations, at least in 
d = 2 [151] 

All of these algorithms deserve systematic study, in particular of their dynamic critical 
behavior. One would like to know whether they all lie in the same dynamic univer- 
sality class, and whether the conjecture r ~ A^ 2 is exact, approximate or wrong. 

An alternative way of making the reptation algorithm ergodic is to switch to 
the variable- N ensemble and introduce A A" = ±1 moves (L and M in Figure 9), 
as proposed by Kron et al. [145]. But once one does this, there is no great reason 
to retain the "slithering-snake" moves; one can just as well use only the AA" = ±1 
moves. This leads to the slithering-tortoise (Berretti-Sokal) algorithm (see Section 
6.6.1). 

6.4.3 Pivot Algorithm 

The pivot algorithm has a curious history. It was invented by Lai [152] in 1969, 
and then promptly forgotten by almost everyone. 44 It was reinvented in 1985 by 
MacDonald et al. [161, 162, 163], and reinvented a short time later by Madras. This 
third reinvention led to a comprehensive analytical and numerical study by Madras 
and Sokal [96], which showed that the pivot algorithm is by far the most efficient 
algorithm yet invented for simulating the fixed- N, variable- £ SAW ensemble. After 
that the applications took off. 45 

The elementary move of the pivot algorithm is as follows: Choose at random a 
pivot point k along the walk (0 < k < N — I); choose at random an element g of the 
symmetry group of the lattice (rotation or reflection or a combination thereof); then 
apply g to the part of the walk subsequent to the pivot point (namely u>k+i, ■ ■ ■ , ^at), 

44 The only exceptions I know of are Olaj and Pelinka [153] and Clark and Lai [154]. Continuum 
analogues of the pivot algorithm were used by Stellman and Cans [155, 156], Freire and Horta [157], 
Curro [158, 159], Scott [160] and possibly others. For additional history, see [96, footnote 3]. 

45 Here is a very incomplete list: Caracciolo, Madras, Pelissetto, Sokal and collaborators [132, 164, 
165, 166, 39], Zifferer [167, 168, 169, 170, 171, 172], Bishop and collaborators [173, 174, 175, 176], 
Chorin [177], Eizenberg and Klafter [178]. 
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using Uk as the temporary "origin" [cf. (6.2)]. The resulting walk is accepted if it 
is self-avoiding; otherwise, it is rejected, and the old walk u is counted once more 
in the sample. It is easy to see that this algorithm satisfies detailed balance for the 
standard equal-weight SAW distribution. Ergodicity is less obvious, but it can be 
proven [96, 179]. 

At first thought this seems to be a terrible algorithm: for N large, nearly all the 
proposed moves will get rejected. In fact, this latter statement is true, but the hasty 
conclusion drawn from it is radically false! The acceptance fraction / does indeed go 
to zero as N — > oo, roughly like N~ p ; empirically, it is found that the exponent p is 
^0.19 in d = 2 [96] and wO.ll in d = 3 [96, 167, 178]. But this means that roughly 
once every N' p moves one gets an acceptance. And the pivot moves are very radical: 
one might surmise that after very few accepted moves (say, 5 or 10) the SAW will 
have reached an "essentially new" configuration. One conjectures, therefore, that 
the autocorrelation time r of the pivot algorithm behaves as ~ N' p . Things are in 
fact somewhat more subtle (see below), but roughly speaking (and modulo a possible 
logarithm) this conjecture appears to be true. On the other hand, a careful analysis of 
the computational complexity of the pivot algorithm (see also below) shows that one 
accepted move can be produced in a computer time of order N . Combining these two 
facts, we conclude that one "effectively independent" sample (at least as regards global 
observables) can be produced in a computer time of order N (or perhaps N\ogN). 
This is vastly better than the jV 2+2i/ ~ jV~ 3 - 2 of the local A^-conserving algorithms and 
the N~ 2 of the bilocal algorithms. Indeed, this order of efficiency cannot be surpassed 
by any algorithm which computes each site on successive SAWs, for it takes a time 
of order N simply to write down an A^-step walk! 

Let us mention briefly some relevant issues; the reader is referred to [96] for a 
fuller discussion. 

Variants of the pivot algorithm. Different variants of the pivot algorithm are 
obtained by specifying different distributions when we "choose at random": 

1) The pivot point k can be chosen according to any pre-set family of strictly 
positive probabilities p 0} . . . , pjv-i- The strict positivity (p^ > for all k) is necessary 
to ensure the ergodicity of the algorithm. Most work has used a uniform distribu- 
tion, but there could be some (probably minor) advantage in using a non-uniform 
distribution. 

2) Let G be the group of orthogonal transformations (about the origin) which 
leave invariant the lattice Z . Then the symmetry operation g £ G can be chosen 
according to any pre-set probability distribution {p g } ge o which satisfies p g = p g -i for 
all g, and which has enough nonzero entries to ensure ergodicity (see below). The 
condition p g = p g -i is easily seen to be both necessary and sufficient to ensure detailed 
balance with respect to the equal- weight distribution tt. 

In dimension d = 2, G is the dihedral group D 4 , which has eight elements: the 
identity, two 90° rotations, the 180° rotation, two axis reflections, and two diagonal 
reflections. Detailed balance holds provided that p +90 o rot = p_ 90 o rot . Ergodicity holds 
if the probabilities p g are nonzero for either 

• both diagonal reflections [179]; or 
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• ±90° rotations and the 180° rotation [96]; or 

• ±90° rotations and both axis reflections [96]. 

Most work has used a uniform distribution over the seven non-identity operations, 
but some gain could probably be achieved by using a non-uniform distribution. 

In general dimension d, the symmetry group of I d is the group 0(d, Z) of orthog- 
onal matrices with integer entries. In fact every such matrix is of the form 

gij = o"A',7r(j) , (6.4) 

where <Ti, . . . , = ±1 and tt is a permutation of {1, . . . , d}. Using this description, 
the pivot algorithm can be programmed very easily in any dimension. 

Acceptance fraction and autocorrelation time. Suppose we know that the accep- 
tance fraction / in the pivot algorithm behaves as / ~ N~' p as N — > oo. Then, 
as argued above, after a few successful pivots — i.e. a time of order 1/ f ~ N' p — 
the global conformation of the walk should have reached an "essentially new" state. 
Thus, we expect that for observables A which measure the global properties of the 
walk — such as the squared end-to-end distance R 2 e or the squared radius of gyration 
R 2 — the autocorrelation time r,- ni|J 4 should be a few times 1//. This is confirmed 
numerically [96, Section 4.3]. On the other hand, it is important to recognize that 
local observables — such as the angle between the 11 th and 18 t/l steps of the walk 
— may evolve a factor of N more slowly than global observables. For example, the 
observable mentioned in the preceding sentence changes only when serves as a 
successful pivot point; and this happens, on average, only once every N/ f attempted 
moves. Thus, for local observables A we expect r,- ni|J 4 to be of order N/ f. By (3.39), 
T exp must be of at least this order; and if we have not overlooked any slow modes in 
the system, then r exp should be of exactly this order. Finally, even the global observ- 
ables are unlikely to be precisely orthogonal [in / 2 (vr)] to the slowest mode; so it is 
reasonable to expect that T exPt A be of order N/ f for these observables too. In other 
words, for global observables A we expect the autocorrelation function pAAif) to have 
an extremely-slowly-decaying tail which, however, contributes little to the area under 
the curve. This behavior is illustrated by the exact solution of the pivot dynamics 
for the case of ordinary random walk [96, Section 3.3], and by numerical calculations 
for the SAW. 

The foregoing heuristic argument is, of course, far from a rigorous proof. It is not 
in general possible to find upper bounds on the autocorrelation time in terms of the 
acceptance fraction; the problem is that the state space could contain "bottlenecks" 
through which passage is unusually difficult. No one has suggested any reason why 
such bottlenecks should occur in the pivot algorithm, but neither does there exist any 
proof of their nonexistence. 

A heuristic argument [96, Section 3.2] suggests that / ~ N~' p with p = 7 — 1 (= 
11/32 in d = 2, ss0.16 in d = 3, and in d > 4). Quantitatively this prediction is 
incorrect; numerical experiments yield p 0.19 in d = 2 [96] and p 0.11 in d = 3 
[96, 167, 178]. However, the argument does correctly predict that p is small and that 
it gets smaller in higher dimension. It is an open question to find a better theoretical 
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prediction for p } and in particular to express it in terms of static exponents for the 
SAW. 



Computational complexity. A very important issue in any algorithm — but espe- 
cially in a non-local one — is the CPU time per iteration. By using a hash table (see 
Section 7.1.2), the self-avoidance of a proposed new walk can be checked in a time 
of order N. But one can do even better: by starting the checking at the pivot point 
and working outwards, failures can be detected in a mean time of order N 1 ~ p [96, 
Sections 3.4 and 4.4]. The mean CPU time per successful pivot is therefore ^N 1 ~ p for 
each of ~N P failures, plus for one success, or in all. Combining this with the 
observations made previously, we conclude that one "effectively independent" sample 
— as regards global observables — can be produced in a computer time of order N. 

Initialization. There are two main approaches: 

1) Equilibrium start. Generate the initial configuration by dimerization (Section 
4.3); then the Markov chain is in equilibrium from the beginning, and no data need 
be discarded. This approach is feasible (and recommended) at least up to N of 
order a few thousand. There is no harm in spending even days of CPU time on this 
step, provided that this time is small compared to the rest of the run; after all, the 
algorithm need only be initialized once. 

2) "Thermalization" . Start in an arbitrary initial configuration, and then discard 
the first ridisc ^ T ex P ~ N / f iterations. This is painful, b factor 
larger than r,- ni|J 4 for global observables A; thus, for very large N (> 10 5 ), the CPU time 
of the algorithm could end up being dominated by the thermalization. Nevertheless, 
one must resist the temptation to cut corners here, as even a small initialization 
bias can lead to systematically erroneous results, especially if the statistical error is 
small [39]. Some modest gain can probably be obtained by using closer-to-equilibrium 
initial configurations (e.g. [168]), but it is still prudent to take ridisc at least several 
times N I f . 

Initialization will become a more important issue in the future, as faster computers 
permit simulations at ever-larger chain lengths. 

6.5 Fixed-iV, Fixed-x Algorithms 

This ensemble has not been considered until very recently; it seems difficult to 
devise algorithms which are ergodic under the severe constraints of fixed chain length 
and fixed endpoints. 

6.5.1 Bilocal Algorithms 

I do not know whether there are any bilocal (or mixed local/bilocal) algorithms 
that are ergodic for this ensemble. 
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6.5.2 Cut-and-Paste Algorithms 

These algorithms were devised by Dubins, Madras, Orlitsky, Reeds and Shepp 
[180, 179]. In dimension d = 2 the simplest cut-and-paste algorithm uses two moves: 

• Inversion of the subwalk (Figure 8). 

• If lo\ — LOk lies on a ±45° diagonal line, then one can reflect the subwalk 
through the perpendicular bisector of the segment LOk^h followed by an inver- 
sion. 

This algorithm is ergodic [179]. In dimension d > 3 one must adjoin coordinate- 
interchange moves, and the algorithm is then again ergodic [179]. 

The dynamic critical behavior of this algorithm is unclear. One's first guess is 
a behavior similar to that of the pivot algorithm, i.e. an acceptance fraction / ~ 
N~ q for some small power q, and an autocorrelation time r,- ni|J 4 ~ N q for global 
observables. However, one should remember that the diagonal-reflection moves — 
which are necessary for ergodicity 46 — are possible only for a fraction ~iV _i/ of pairs 
k,l. So perhaps one should expect r,- ni|J 4 ~ N q+V . (Numerically, this might be seen 
most easily by looking at an observable which is sensitive to the diagonal-reflection 
moves, e.g. the numbers iVi, . . . ,Nj, of steps along each axis.) On the other hand, 
failures of diagonal reflection due to disallowed u>i — Uk can be detected in a CPU 
time of order 1; so such failures may not affect the scaling of the product Ti ni ^cpu ■ 

See [181] for a first study of the cut-and-paste algorithm (on the face-centered- 
cubic lattice). 

6.6 Variable-iV, Variable-x Algorithms 
6.6.1 Slithering- Tortoise (Berretti-Sokal) Algorithm 

The slithering-tortoise algorithm 47 [182] is a Markov chain with state space 

oo 

S = U Sn and invariant probability distribution (2.17) with p = 0, i.e. 

N=0 

wp(u)) = const x /3 M . (6.5) 

The algorithm's elementary moves are as follows: either one attempts to append a 
new step to the walk, with equal probability in each of the 2d possible directions; or 
else one deletes the last step from the walk. In the former case, one must check that 
the proposed new walk is self-avoiding; if it isn't, then the attempted move is rejected 
and the old configuration is counted again in the sample ("null transition"). If an 
attempt is made to delete a step from an already-empty walk, then a null transition 

46 Without them, the numbers N\, . . . , Nd of steps along each axis would be separately conserved. 

47 Because the walk sticks out and retracts its "head" , like a tortoise. This algorithm is also known 
as the Berretti-Sokal algorithm, or BS for short. 
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is also made. The relative probabilities of AiV = +1 and AiV = — 1 attempts are 
chosen to be 



P(AN = +1 attempt) = 2 ^ (6.6) 

1 I 2 d p 

P(AN = -1 attempt) = (6.7) 

It is easily verified that this transition matrix satisfies detailed balance for the distri- 
bution TTp. It is also easy to see that the algorithm is ergodic: to get from a walk u 
to another walk u/, it suffices to use AiV = — I moves to transform u into the empty 
walk, and then use AiV = +1 moves to build up the walk u' . 

Regarding the critical slowing-down of the slithering-tortoise algorithm, we can 
argue heuristically that 

r ~ (iV) 2 . (6.8) 

To see this, consider the quantity Nit) = \u\(t), the number of steps in the walk 
at time t. This quantity executes, crudely speaking, a random walk (with drift) on 
the nonnegative integers; the average time to go from some point N to the point 
(i.e. the empty walk) is of order N 2 . Now, each time the empty walk is reached, all 
memory of the past is erased; future walks are then independent of past ones. Thus, 
the autocorrelation time ought to be of order (iV 2 ), or equivalently (iV) 2 . 

This heuristic argument can be turned into a rigorous proof of a lower bound 
T~ex P <L Ti n t^N > const X (N) 2 [69]. However, as an argument for an upper bound of the 
same form, it is not entirely convincing, as it assumes without proof that the slowest 
mode is the one represented by N(t). With considerably more work, it is possible to 
prove an upper bound on r that is only slightly weaker than the heuristic prediction: 
T~ex P < const X (iV) 1+7 [69, 183, 184]. (Note that the critical exponent 7 is believed 
to equal 43/32 in dimension d = 2, sal. 16 in d = 3, and 1 in d > 4.) In fact, there is 
reason to believe [185] that the true behavior is r ~ (N) p for some exponent p strictly 
between 2 and 1 + 7. A deeper understanding of the dynamic critical behavior of the 
slithering-tortoise algorithm would be of definite value. 



6.6.2 Join-and-Cut Algorithm 

The behavior r J> (N) 2 of the slithering-tortoise algorithm is in fact characteristic 
of any variable- N algorithm whose elementary moves make bounded changes in N: 
roughly speaking, N must perform a random walk on the nonnegative integers, and 
the autocorrelation time of such a random walk satisfies 



r > var(iV) = (N 2 ) - (N) 2 ~ (N) 2 (6.9) 

[132, Theorems A. 6 and A. 7]. Therefore, if one wants to do better than r > (N) 2 , it 
is necessary to make unbounded changes in N. 

An amazingly simple way of doing this was proposed recently by Caracciolo, Pelis- 
setto and Sokal [165]. Their algorithm works in the unorthodox ensemble T^ tot consist- 
ing of all pairs of SAWs {u^-\u^) [each walk starts at the origin and ends anywhere] 
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such that the total number of steps in the two walks is some fixed number N tot : 

T Ntot = {(J 1 ), J 2 )): lo^\lo^ are self-avoiding, with + \ = N tot } 

N tot 

= (J (S Nl x S Ntot - Nl ) . (6.10) 

JV 1= 

Each pair to^) in this ensemble is given equal weight; therefore, the two walks 

are non-interacting except for the constraint on the sum of their lengths. One sweep 
of the algorithm consists of two steps: 

(a) We update independently each of the two walks, using some iV-conserving er- 
godic algorithm (e.g. the pivot algorithm). 

(b) We perform a join-and-cut move: we concatenate the two walks to^ and 
u^ 2 \ forming a new (not necessarily self- avoiding) walk o co^; then we cut 
u/ 1 ) o uj( 2 } at a random position, creating two new walks u'^ and u'^ 2 \ If u'^ 

(2) 

and u' y ' are both self-avoiding, we keep them; otherwise the move is rejected 
and we stay with and u^ 2 \ 

The join-and-cut move is illustrated in Figure 14. 

Since the algorithm used in step (a) is ergodic in the ensemble of fixed-length 
walks, it is easy to see that the full algorithm is ergodic. (If and are per- 
pendicular rods, then the join-and-cut move will always succeed.) It is also easy to 
see that the algorithm satisfies the detailed-balance condition with respect to the 
equal-weight measure 

k(J'\J 2 )) = — for each (^V (2) ) £ T Ntot , (6.11) 

where 

Ntot 

Z(Ntot) = Y, cJViCjVtot-JVi • (6-12) 

JV 1= 

Therefore, from the probability distribution of the random variable Ni = Icl/ 1 ^ (the 
length of the first walk) in the measure (6.11), we can obtain estimates of the critical 
exponent 7 by the maximum-likelihood method (see Section 7.3). Since the join- 
and-cut move can make large jumps in Ni in a single step, this evades the bound 
(6-9). 

The dynamic critical behavior of the pivot + join-and-cut algorithm was studied 
in [165] by both analytical and numerical methods. For the relevant observable X = 
log[iVi(iVt t — iVi)], the autocorrelation time in units of elementary moves is found 
to grow as Ti nt: x ~ ^tS' 7 ° in = 2. On the other hand, the mean CPU time per 
elementary move behaves as Tcpu ~ jV* - 81 ^ j us t as in the pivot algorithm. Hence 
the autocorrelation time in CPU units behaves as Ti ntj xTcpu ~ jV* 1 - 51 ^ which is 
a significant improvement over the r ~ N~ 2 of the slithering-tortoise algorithm. 
Moreover, the behavior is expected to be even better in higher dimensions. 
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Figure 14: Join-and-cut move: the two SAWs (upper figure) are concatenated (middle 
figure) and then cut at the point marked with a square (lower figure). Note that the 
concatenated walk need not be self-avoiding. 
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6.7 Variable-iV, Fixed-x Algorithms 

6.7.1 BFACF Algorithm 

The BFACF algorithm was invented by Berg and Foerster [186] and Aragao de 
Carvalho, Caracciolo and Frohlich [61, 187]. 48 It is a Markov chain with state space 

oo 

S(x) = U Sn(x) and invariant probability distribution (2.14) with p = 1, i.e. 
N=o 

tt p (u) = const x |cu|/3 M . (6.13) 
The elementary moves of the BFACF algorithm are the following local deformations: 

• The one-bead flip (move A of Figure 1), which has A A" = 0. 

• Kink insertion (move J of Figure 9), which has AA" = +2. 

• Kink deletion (move K of Figure 9), which has AA" = —2. 

Note that all these moves can be generated by displacing the middle bond of a three- 
bond group by one lattice unit perpendicular to itself in one of the 2d — 2 possible 
directions. Therefore, one iteration of the BFACF algorithm consists of the following 
operations: 

1) Choose at random a bond of the current walk u (with equal probability for each 
bond). 

2) Enumerate the 2d — 2 possible deformations of that bond; choose randomly 
among these deformations, giving each deformation a probability p(AN) de- 
pending only on A A" = — (If the sum of these probabilities is s < 1, 
then make a "null transition" u — > u with probability 1 — 5.) The probabilities 
p(AN) will be specified below. 

3) Check whether the proposed new walk u' is self-avoiding. If it is, keep it; 
otherwise, make a null transition. 

This algorithm satisfies detailed balance for TTp provided that 

p(+2) = f3 2 p(-2) . (6.14) 

On the other hand, the sum of the probabilities must in all cases be < 1; this imposes 
the inequalities [132] 

[l + (2d-3)f3 2 ]p(+2) < (3 2 (6.15a) 
2p(0) + (2d-4)p(+2) < 1 (6.15b) 

48 The exposition in the original paper [187] suffers from an unfortunate confusion regarding the 
meaning of p(AN). Here I follow the corrected exposition in [132, Section 3.1]. 
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A nearly optimal choice is 



P <-' 2 > = l + (J-3)y (6 ' lfa) 
= 2[l+ 1 (M-3)y] (6 ' 16b) 

* +2 > = l + (2f-3)y (6 ' 16C) 

(see [132] for further discussion). 

The ergodicity of the BFACF algorithm is an extremely subtle problem. The 
known results are the following: 

• In dimension d = 2, the BFACF algorithm is ergodic, for all choices of x [11]. 

• In dimension d = 3, the algorithm is non-ergodic if ||:e||oo = max(|xi|, \x 2 \ } |x 3 |) = 
1, due to knotted configurations which cannot be untied [188]. 49 However, it 
can be made ergodic 50 by adjoining the three-bead crankshaft move (I of Figure 
3) [188]. 

• In dimension d = 3, the algorithm is ergodic if ||:e||oo > 2 [189]. (It is then 
possible to disentangle any knot, no matter how large and complicated, by a 
motion which passes one strand at a time between the endpoints.) 

• In dimension d > 4, the algorithm is presumably ergodic for all x, but a rigorous 
proof appears to be lacking. 

The dynamical behavior of the BFACF algorithm is also an extremely subtle prob- 
lem. The known results are the following: 

• Texp = +°° f° r all f3 > [190]. This surprising result arises from the existence 
of arbitrarily slowly-relaxing modes associated with sequences of transitions 
to —>■•••—>■ u' with A(u,u') ^> max(|cu|, |u/|), where A(u,u') is the minimum 
surface area spanned by the union of u and u' . 

• Tint,A ^ const X [{A 2 ) — {A) 2 ] } where A = A(lo } lo°) for some fixed walk u° 
from to x [132]. Assuming the usual scaling behavior [191], this implies 

r m t,A £ (ivr. 

• Numerically, it appears that r 8 ' ntj jv ~ {N) p with p equal or very nearly equal to 
4;/ [191]. 



49 For the BFACF algorithm applied to unrooted polygons (ring polymers), the ergodic classes are 
precisely the knot classes [188]. This is probably true also for the BFACF algorithm applied to walks 
with ||;c||oo = 1, but it has apparently not yet been proven. 

50 At least in the case of unrooted polygons. It is probably true also for the BFACF algorithm 
applied to walks with ||«||oo = 1; but it has apparently not yet been proven. 
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However, in dimension d = 3 (for ||:e||oo > 2) the relaxation of long chains may (pos- 
sibly) be further slowed down by the appearance of "quasi-knots" , i.e. configurations 
whose decay is extremely slow due to the complicated sequence of moves needed to 
untie them. Since untying a quasi-knot requires passing the end of the chain through 
the knotted region (which may be near the middle of the chain), this process be- 
comes more time-consuming the larger (N) is. This effect might possibly increase 
the dynamic critical exponent above 4z/, at least for observables that are sensitive to 
(quasi-)knots. 51 

6.7.2 BFACF + Cut-and-Paste Algorithm 

Caracciolo, Pelissetto and Sokal [132] have recently proposed to speed up the 
BFACF algorithm by adjoining some non-local (cut-and-paste) moves. The hope is 
that these non-local moves will destabilize the long-lived (metastable) configurations 
that are responsible for the slow equilibration of the BFACF algorithm. Thus, the al- 
gorithm is a hybrid in which the non-local moves hopefully assure the rapid equilibra- 
tion within subspaces of fixed N, while the local (BFACF) moves assure equilibration 
between different N (and in particular make the algorithm ergodic). The algorithm 
has a free parameter p n \ — the percentage of non-local moves — which can be tuned 
as a function of (N) to optimize the computational efficiency. 

The best one can hope for is an autocorrelation time r ~ (N) 2 : for even if the 
non-local moves were to cause instant equilibration at fixed N, the local moves would 
still have to carry out a random walk in N. Such a behavior, if achieved, would 
be a significant improvement over the pure BFACF algorithm. This estimate refers, 
however, to physical time units; since the non-local moves require a mean CPU time 
per move that grows as a fractional power of (N), it is a subtle matter to choose 
p n i so as to minimize the autocorrelation time as measured in computer (CPU) time 
units. 

Numerical experiments in d = 2 [132] confirm that the autocorrelation time (in 
units of elementary moves) at fixed p n \ > scales as r^jy ~ (A^)~ 2 . Taking into 
account the CPU time, it is found that the optimal p n \ scales as ~1/(A^)~ ' 8 , and the 
autocorrelation time in CPU units then scales as Ti ni ^Tcpu ~ (A^)~ 2 ' 3 . In practice, 
at (N) 100 it is found that the physical (resp. CPU) autocorrelation time of the 
hybrid algorithm with p n \ = 0.05 is a factor 6 (resp. 4) smaller than that of the pure 
BFACF algorithm. The BFACF + cut-and-paste algorithm provides, therefore, a 
significant (though not exactly earth-shaking) improvement over previous algorithms 
for variable- N, fixed-endpoint SAWs. But more research is needed to establish con- 
clusively the dynamic critical behavior of this algorithm. 

51 For example, any one of the standard knot invariants [192, 193] applied to ui o ui° , where ui° is 
some fixed path from x to in R \ Z . 
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7 Miscellaneous Issues 



7.1 Data Structures 

The new algorithms described in Section 6 are potentially very efficient; but in 
order to realize this potential, it is necessary to choose carefully the data structures 
and the low-level "bookkeeping" algorithms. For example, a local or bilocal update 
can be carried out, with suitable data structures, in a CPU time of order 1 (as one 
would expect); but a naive choice of data structures might force us into "garbage 
collection" (mass copying of data from one storage location to another) costing a 
time of order N. Likewise, checking the global self-avoidance of a walk u (as arises 
e.g. in the pivot algorithm) can be carried out, with suitable data structures, in a 
CPU time of order iV; but the naive method (comparing Ui to Uj for each pair 
takes a time of order iV 2 — which would nullify the advantages of the pivot algorithm. 

We divide the discussion of data structures into two parts: those needed in updat- 
ing the walk, and those needed in testing self-avoidance . As will be seen, it is usually 
necessary to maintain redundant data structures, in order that both operations can 
be carried out efficiently. For the first task, one typically uses linear (or circular) 
lists of various kinds; for the second task, one uses bit tables or hash tables. A 
lucid discussion of these structures can be found in [94, Chapters 11 and 12]. 

7.1.1 Updating the Walk 

Each of the algorithms described in Section 6 requires one or more of the following 
operations in order to update the walk: 

1. Choose a random site (or a random step) of the walk. 

2. Update the coordinates of a given site of the walk. 

3. Find the successor (or predecessor) of a given site on the walk. 

4. Insert one or more new sites 

(a) at the beginning of the walk 

(b) at the end of the walk 

(c) in the interior of the walk 

5. Delete one or more sites from (a), (b) or (c). 

Depending on the particular operations needed, we will choose to represent the walk 
as one or another type of list. 

The simplest type of list is a sequentially allocated linear list: the walk 
coordinates lo , . . . , cujv are stored in order in successive memory locations (e.g. a 
Fortran array). Clearly this structure permits the first three operations to be carried 
out in a time of order 1. In particular, this suffices for local A^-conserving moves, 
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pivot moves, and cut-and-paste moves. 52 Insertion and deletion at the beginning and 
end of the walk can also be carried out in a time of order 1, provided that an upper 
bound is known in advance on how far the walk can grow in each direction (so that 
sufficient storage can be allocated). This suffices for the slithering-tortoise algorithm, 
because one can take e.g. N max 70 (N) and be virtually certain that this walk length 
will never be exceeded (at least not before the sun runs out of hydrogen and engulfs 
the earth). 

For the slithering-snake (reptation) algorithm one should use a sequentially allo- 
cated circular list: the walk coordinates are stored in sequential memory locations, 
but in general in a cyclically permuted order, i.e. u>j,Uj + i, . . . , cujv , , ^1 , • • • A 
pointer then indicates which element is lo . (A flag may also indicate which direction 
is "forward": this is useful in implementing the first version of the algorithm [146, 
pp. 4592-4593], in which "head" and "tail" are sometimes interchanged.) 

None of the foregoing structures are adequate for those algorithms which insert 
or delete sites in the interior of the walk (BFACF, most bilocal algorithms, etc.): 
insertion or deletion of even one site would necessitate shifting a large part of the 
list ("garbage collection"), which would take a time of order N. A more flexible 
data structure is the doubly linked list: here the walk coordinates cu , . . . } lon may be 
stored anywhere in memory (neither contiguously nor in order); but together with 
each coordinate Ui we maintain link fields £~(i) and £ + (i) } which indicate the storage 
locations where the LOi-i and respectively, are to be found. Unused but available 

memory is also stored in a linked list, the so-called free list (here the two-way linking 
is unnecessary). It is easy to insert or delete into the interior of a doubly linked 
list, by using the link fields £~(i) and £ + (i) to locate the predecessor and successor 
sites (which take part in the relinking). On the other hand, since the {u;i} may be 
scattered throughout memory, it is not easy to choose a random site or step of the 
walk. (One would have to "thread through" the list sequentially, taking a time of 
order N.) 

To get the virtues of both sequential allocation and linked list, one can use a 
contiguously allocated doubly linked linear list (see e.g. [132, Section 3.3]). 



keep track of the sequence of steps along the walk, we use forward pointers {£ + (i)}fL 
and backward pointers {£~ (i)}fLo] here £ + (i) [resp. £~ (i)] is the index corresponding 
to the site following [resp. preceding] the site whose index is z, or —1 if no such site 
exists. It is often convenient that the initial and final points of the walk be assigned 
to indices and 1, respectively. Therefore, 

52 In the case of pivot or cut-and-paste moves, one maintains two linear lists — one "active" 
and one "scratch" — together with a flag saying which is currently which. The coordinates of a 
proposed new walk are successively calculated and placed in the "scratch" list, and self-avoidance is 
simultaneously tested (see Section 7.1.2); if the self-avoidance test is passed, the flag is flipped and 
the "scratch" list now becomes "active" . 
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s(0) = L0 (=0) 

s(£ + (0)) = uj, 
s(£ + (£ + (0))) = u 2 



and so on. Likewise, 



5(1) = u N (=x) 
s{£-{1)) = LO N _ X 

i£-{£-{l))) = LON-2 



and so on. It is then trivial to choose a random site of the walk, as this is equivalent 
to choosing a random index z. 53 It is also trivial to insert new sites: just put them in 
locations iV + 1, N + 2, etc. and then relink. Deletion requires "garbage collection" to 
maintain the contiguous allocation, but this also can be performed in a time of order 
1: move the entries from locations iV, N — 1, etc. into the just-vacated locations, with 
appropriate updates to the link fields. So the contiguously allocated doubly linked 
linear list is the appropriate data structure for the BFACF and bilocal algorithms. 



7.1.2 Testing Self- Avoidance 

If the walk configuration were stored only as a linear list (whether sequentially 
allocated or linked), then this entire list would have to be searched each time we want 
to add one new site to the walk, in order to check the self- avoidance constraint. This 
would take a time of order iV, which is a disaster. 

The solution is to maintain two (redundant) data structures to store the current 
walk configuration: a linear or circular list as described in the preceding subsection, 
and a bit table or hash table. 54 The latter can be defined abstractly as data struc- 
tures that perform the following functions: Given a finite (but typically enormous) 
set K of "possible keywords", we wish to store a subset H C K (of cardinality < 
some maximum M) in such a way that, for any x £ K, the following operations can 
be carried out rapidly: 

1. Query. Is x £ HI 

2. Insertion. Insert x into H (if it is not in H already). 

3. Deletion. Delete x from H (if it is in H currently). 

53 Note that this works only for choice with uniform probability (or uniform excluding one or both 
endpoints). It would not work if one wanted to choose sites with a nonuniform probability depending 
on their location along the walk. 

54 To my knowledge, the first use of a bit table for self-avoidance checking was by McCrackin [194], 
and the first such uses of a hash table were by Alexandrowicz [97] and Jurs and Reissner [195]. 
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Here "rapidly" means "in a time of order 1, on the average". In our application, the 
set K of possible keywords will be the set of all points in some box B Cl d which is 
large enough to contain all possible points in the walk u: e.g. a cube of side > 2N 
centered at the origin for a fixed- N simulation, or a cube of side J> 140 (N) for a 
variable- N simulation. 

A bit table is simple to describe: it is a large block of memory in which each 
possible keyword x £ K (i.e. each site of the large box B) is assigned one distinct bit. 
That bit is set to 1 if x £ H (i.e. if the site in question currently belongs to the walk), 
and otherwise. Clearly, the three operations of query, insertion and deletion can 
each be carried out in a time of order 1. The trouble is that for large iV, the memory 
requirements become prohibitive, especially in dimension d > 2: e.g. at N = 1000 
the memory needed is 0.5 megabyte for d = 2, and 1 gigabyte for d = 3! 

The memory requirements can be reduced by a clever trick called the sliding bit 
table [196, 197]. This approach is adequate in algorithms in which the new sites are 
all being added (or old sites being deleted) in the same small vicinity. In practice this 
means the slithering-tortoise algorithm. 

A more general approach is provided by the hash table [94, Chapter 12] [198, 
Section 6.4]: An array of M words is assigned, and each keyword x £ K is assigned a 
primary address h(x) in this array. Since in general M <C \K\, the "hash function" h 
is necessarily many-to-one, i.e. many distinct keywords may share the same primary 
address, leading to the possibility of collisions. The various hash-coding algorithms 
are distinguished by the method they use to resolve collisions, i.e. to decide where to 
store a keyword if its primary address happens to be occupied by some other keyword. 
One of the simplest collision-resolution schemes is linear probing: if the primary 
address h(x) is occupied, the algorithm searches successively in addresses h(x) + 1, 
h(x) + 2, ... (modulo M) until it finds either the keyword x or an empty slot. Other 
approaches involve chaining via linked lists. 

In the worst possible case, a single query or insertion into a hash table containing 
N entries could take a time of order N. However, it can be shown [198, Section 6.4] 
that as long as the hash table does not get close to full (i.e. N does not get near Af), 
then the average time for a single query or insertion (assuming random distribution 
of the points) is of order 1. So the hash-table method is nearly as fast as the bit-table 
method, and far more space-effective. 

We remark that deletion from a linear-probing hash table requires some care: if 
done naively, entries can get "lost" [198, pp. 526-527]. However, these subtleties are 
irrelevant if deletions always occur in a last-in-first-out manner (as in the slithering- 
tortoise algorithm), or occur only when cleaning up the table at the end (as in the 
pivot and cut-and-paste algorithms). In the latter case it suffices to keep a linear list 
of the memory locations in which elements have been inserted; these locations can 
then be cleared at the end. 

Depending on the application, it may be desirable to maintain the bit table/hash 
table either as a permanent data structure (containing the current walk u) or as a 
scratch data structure (for checking self-intersection of proposed new walks u/); or it 
may be desirable to maintain one of each, together with a flag saying which is which. 
See [132, Section 3.3] and [165, Section 3.2] for details. 
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7.2 Measuring Virial Coefficients 

The virial coefficients Bk play a central role in the theory of dilute polymer so- 
lutions. But to measure them it is not necessary (or even desirable) to simulate a 
many-chain system; rather, it suffices to simulate k independent polymer chains and 
then measure a suitable overlap observable. Consider, for concreteness, the second 
virial coefficient b[ Ni,N2 \ defined by (2.10)-(2.12). We have 

= \{T(^\^y)) Nl , N2 , (7.1) 

where and are independent SAWs of N\ and iV 2 steps, respectively, and 
TV 1 ),^ 2 )) is the number of translates of which somewhere intersect 

TV 1 ),^ 2 )) = #{x G l d : J 1 ^ n (J 2) + x) ^ 0} . (7.2) 

So we can run in parallel two independent simulations (using e.g. the pivot algorithm), 
and then every once in a while measure the observable T 

The straightforward method for determining T (cu( 1 ),cu( 2 ))is to compute x = u\ — 

(2) 

LOj for each of the (Ni + l)(iV 2 + 1) pairs i,j, write these points into a hash table (see 
Section 7.1.2), and count how many distinct values of x are obtained. Unfortunately, 
this takes a CPU time of order N\N 2} i.e. order iV 2 if Ni = N 2 = N. By contrast, 
we expect that one "effectively independent" sample of the pair (u/ 1 ), u^) can be 
produced by the pivot algorithm in a CPU time of order N . So this approach would 
spend more time analyzing the data than generating it! 

Fortunately, there exist Monte Carlo algorithms which can produce an unbiased 
estimate of T{u^-\ Cl/ 2 )), with statistical errors comparable to or smaller than those 
already intrinsic in the observable T{u^-\ Cl/ 2 )), in a mean CPU time of order N. 
So the idea is to perform a "Monte Carlo within a Monte Carlo". At least two 
such algorithms are known: the "hit-or-miss" algorithm [39], and the Karp-Luby 
algorithm [199, 200]. See [96, Section 5.3] for a preliminary discussion, and [39] for a 
fuller account. 

The "hit-or-miss" algorithm can easily be generalized to compute higher virial 
coefficients. I do not know whether the Karp-Luby algorithm can be generalized in 
this way. 

7.3 Statistical Analysis 

For the most part, the statistical analysis of SAW Monte Carlo data uses the same 
methods as are employed in other types of Monte Carlo simulation. In particular, with 
dynamic Monte Carlo data it is essential to carry out a thorough autocorrelation 
analysis: only in this way can one test the adequacy of the thermalization interval 
and the run length and properly assess the statistical error bars. For details, see e.g. 
[36, Section 3], [37, Section 2], [96, Appendix C] and [11, Sections 9.2.2 and 9.2.3]. 

Typically one carries out fixed- N simulations at a (wide) range of values of N, 
and then uses weighted least-squares estimation [201] to extract the critical 
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exponent v and the various critical amplitudes. However, for high-precision work it is 
important to take account of corrections to scaling. According to renormalization- 
group theory [202, 203], the mean value of any global observable O behaves as N — > oo 
as 



(0) N = AN' P 



1 + — + — + H — H 1 h 

N N 2 N Al N Al+1 N Al+2 

+ J0_ _Ci_ _C2_ ' 

]yA 2 ]yA 2 +i ]yA 2 +2 



(7.3) 



Thus, in addition to "analytic" corrections to scaling of the form a^/N } there are 
"non- analytic" corrections to scaling of the form bk/N Al+k , Ck/N A2+k and so forth, 
as well as more complicated terms not shown in (7.3). The leading exponent p and 
the correction-to-scaling exponents Ai < A 2 < . . . are universal; p of course depends 
on the observable in question, but the A; do not. [Please note that the exponents 
Ai < A 2 < . . . have no relation whatsoever to the gap exponent A 4 defined in (2.9). 
The notation used here is standard but unfortunate.] The various amplitudes (both 
leading and subleading) are all nonuniversal. However, ratios of the corresponding 
amplitudes A, b and c (but not or the higher bk } Ck) for different observables are 
universal [52, 203]. 

Obviously it is hopeless to try to estimate from noisy Monte Carlo data more than 
the first one or two terms in (7.3), i.e. 



(0) N = AN* 



1 + 



N A i 



(7.4) 



where A = min(Ai, 1). A reasonable approach is as follows: First truncate the series 
at zeroth order ({0)n = AN' P ) and perform a weighted least-squares fit using only 
the data at N > N m i n} for a sequence of successively larger values N m i n . For each 
such fit, the x 2 value indicates whether the hypothesis (O)n = AN' P for N > N m i n is 
consistent with the data — or in other words, whether the corrections to scaling for 
N > N m i n (which surely exist) are statistically significant. 55 If they are not, then one 
is done; the estimates of p and A ought to be roughly independent of N m i n} within 
statistical error bars. If the corrections are significant, then one can insert the first 
correction-to-scaling term and redo the least-squares fit and % 2 test. 56 However, one 
must keep in mind that the estimate of A (and the correction amplitude) produced 
by such a fit is merely an effective exponent which may be imitating the combined 
effect of several correction-to-scaling terms over some particular range of N. Such 
an effective exponent is of no intrinsic physical interest; this approach should simply 



55 Statistical significance (resp. insignificance) of the corrections for N > N m i„ means only that 
these corrections are larger than (resp. comparable to or smaller than) the statistical errors in this 
particular simulation. By making sufficiently long runs, the statistical error bars can in principle 
be made arbitrarily small, and so the corrections to scaling will always eventually be found to be 
statistically significant. 

56 0ne may either make a guess for A and then estimate the amplitude via linear least-squares, or 
estimate simultaneously both A and the amplitude by nonlinear least-squares. 
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be thought of as a semi-empirical method for extrapolating the data to N — > oo. In 
principle the true A can be found by taking N m i n very large — large enough so that 
the second correction-to-scaling term is negligible compared to the first — but this 
means that the first correction-to-scaling term will also be negligible compared to the 
leading term, and the statistical errors in A and the corresponding amplitude will 
therefore be enormous. At present it seems feasible to obtain only rough estimates of 
A by direct Monte Carlo simulation [204, 39]. However, the situation may improve 
in the future, as more powerful computers become available. 

A novel point arises when estimating fi and 7 (or fi and a s i ng ) from a variable- 
N simulation. Thanks to (2.1) / (2.17) [or (2.2)/(2.14)], the probability distribution 
of chain lengths N is known exactly except for the two unknown parameters (/i,7) 
[or (fi } a s i ng )] } provided that we temporarily neglect corrections to scaling. This fact 
allows us to use maximum-likelihood estimation [201] to obtain estimates of // 
and 7 which are not only demonstrably optimal in a rigorous statistical sense — 
that is, they achieve the minimum possible mean-square error for a given quantity of 
Monte Carlo data — but which also provide a priori error estimates. This means that 
the statistical errors can be computed reliably, in advance of performing the Monte 
Carlo simulation. Or to put it more strikingly: before performing the simulation, one 
cannot know what the final central estimates will be, but one can know the error 
bars! See [182, Section 4.2] and [165, Section 4] for details. 

8 Some Applications of the Algorithms 

The development over the past decade of efficient Monte Carlo algorithms for the 
SAW (Section 6) has combined with recent advances in computer hardware (notably 
clusters of high-speed RISC workstations) to make possible high-precision studies of 
SAWs that would have been unimaginable only a few years ago. For example, a recent 
study [39] of SAWs in d = 2 and d = 3 has employed chain lengths up to N = 80000, 
obtaining error bars of order 0.1-0.3%. (To be sure, this work took several years of 
CPU time!) 

In this section I cannot hope to do justice to all the applications of the new 
algorithms. Rather, I shall limit myself to giving an informal account of a few illus- 
trative examples drawn from my own work (most of which is joint work with Sergio 
Caracciolo, Bin Li, Neal Madras and Andrea Pelissetto). 

8.1 Linear Polymers in Dimension d = 3 

Probably our most important work is a high-precision study of the critical ex- 
ponents v and 2A 4 — 7 (and in particular the hyperscaling law dv = 2A 4 — 7) and 
universal amplitude ratios for SAWs in both two and three dimensions, using the 
pivot algorithm [39]. In dimension d = 3, the renormalization-group prediction for 
the exponent v is 0.5880 ± 0.0015 [16, 17, 18, 19, 20] or 0.5872 ± 0.0014 [21]. However, 
this method is susceptible to serious (and quite possibly undetectable) systematic 
errors arising from a confluent singularity at the RG fixed point [25, 26]; this led me 
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to be rather skeptical of the claimed error bars. Indeed, when we began this work five 
years ago, the series-extrapolation [205] and Monte Carlo [206, 96] estimates of v were 
significantly higher, around 0.592 ± 0.0015 (one standard deviation). I was therefore 
looking forward to improving the statistics on the Monte Carlo studies by a factor of 
10 or so (i.e. reducing the error bar by a factor of about 3), so as to definitively rule 
out the RG prediction (or at least its claimed error bar). But what actually happened 
is rather different from what I had envisaged! 

The original Monte Carlo studies [206, 96] predicting v ~ 0.592 were based on 
walks of length N <J 3000. But as we studied longer and longer walks, the apparent 
exponent fell. 57 Eventually things stabilized, but we had to use walks up to N = 80000 
and (if we fit to a pure power law) to throw away all walks with N < 5000! And the 
value at which things stabilized was — surprise! — almost exactly the RG value: our 
preliminary estimate is v = 0.5879 ± 0.0005 (68% confidence limits), taking account 
of both statistical errors and corrections to scaling. (This estimate may change when 
we do the definitive data analysis, but probably not by much.) The moral is that 
corrections to scaling are an extremely serious effect in high-precision Monte Carlo 
studies; it is necessary to be very careful, and sometimes to go to enormous chain 
lengths, to escape from their effects. The extraordinary accuracy of the RG prediction 
remains a mystery (at least to me). 58 

Another startling conclusion from this study concerns the interpenetration ratio 
^ [cf. (2.13)], and goes to the heart of polymer theory. But this requires a brief 
historical digression. 

For several decades, most work on the behavior of long-chain polymer molecules 
in dilute solution [208, 12, 5, 209, 15, 210] has been based on the so-called "two- 
parameter theory" in one or another of its variants: traditional (Flory-type) 59 , pseudo- 
traditional (modified Flory-type) 60 or modern (continuous-chain-type) 61 . All two- 
parameter theories predict that in the limit of zero concentration, the mean-square 
end-to-end distance (i?g), the mean-square radius of gyration (Rg) and the interpen- 
etration ratio ^ depend on the degree of polymerization N (or equivalently on the 
molecular weight M = N M monomer ) according to 



57 For example, a recent study of Eizenberg and Klafter [178] using walks of length N < 7200 
found v x, 0.5909 ± 0.0003. 

58 It may be related to the apparent fact [207, 26] that the confluent exponent A2/A1 is very close 
to an integer (namely, it is x 2). 

59 See Yamakawa [12], Sections 11 and 16 (pp. 69-73 and 110-118) and parts of Sections 15, 20b 
and 21b (pp. 94-110, 153-164 and 167-169). See also DesCloizeaux and Jannink [15], Section 8.1 
(pp. 289-313). 

60 See Yamakawa [12], most of Section 15 (pp. 94-110) and parts of Sections 20b and 21b (pp. 153- 
164 and 167-169). See also Domb and Barrett [211]. 

61 These theories take as their starting point the Edwards model of a weakly self-avoiding continu- 
ous chain [212, 213, 214, 215, 216, 209, 15]. (The Edwards model is also equivalent to the continuum 
ip 4 field theory with n = components.) See DesCloizeaux and Jannink [15] for a detailed treatment 
of the Edwards model. 
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(R 2 g ) = AN F Rg (bN) 

= Fy(bN) 



(8.1b) 
(8.1c) 



where F Re} F Rg} Fq, are claimed to be universal functions (which each specific two- 
parameter theory should predict), and A and b are non-universal scale factors de- 
pending on the polymer, solvent and temperature but independent of N. [The con- 
ventional notation is a 2 R = Fr b , a 2 s = 6-Fr , h = a d s F^ / ' z and z = (bN) 2 ~ d l 2 in spatial 
dimension d.\ Moreover, virtually all the theories — and in particular the modern 
continuous-chain-based theories — predict that Fq, is a monotone increasing and con- 
cave function of its argument bN } which approaches a limiting value ^* 0.2 — 0.3 
as bN — > oo. 

But our Monte Carlo data show precisely the opposite behavior: ^ is a decreasing 
and convex function of iV, which approaches a limiting value ^* 0.247 as N — > oo 
(Figure 15). The same behavior was found by Nickel [203]. Indeed, there is exper- 
imental evidence that for real polymers in a sufficiently good solvent, the approach 
to ^* is also from above, contrary to the two-parameter theory [217, 218, 219, 210]. 
This behavior was considered to be a perplexing "anomalous effect", and various 
explanations were advanced [220, 221, 218]. What is going on here? 

The correct explanation, in my opinion, was given two years ago by Nickel [203] 
(see also [222, 223]): theories of two-parameter type are simply wrong. Indeed, they 
are wrong not merely because they make incorrect predictions, but for a more fun- 
damental reason: they purport to make universal predictions for quantities that are 
not in fact universal. Two-parameter theories predict, among other things, that ^ 
is a universal function of the expansion factor a 2 s = (R 2 ) / {R 2 )T g ] in particular, is 
claimed to depend on molecular weight and temperature only through the particular 
combination a|(Af, T). This prediction is quite simply incorrect, both for model sys- 
tems and for real polymers. Indeed, even the sign of the deviation from the limiting 
value ^* is not universal. 

All this has a very simple renormalization-group explanation [203], so it is surpris- 
ing that it was not noticed earlier. As mentioned already in Section 7.3, standard RG 
arguments predict, for any real or model polymer chain, the asymptotic behavior 

(R 2 e ) = A Re N 2 »(l + b Re N- A i +...) (8.2a) 
(R 2 g ) = A Rg N 2v {l + b Rg N~^ +...) (8.2b) 
= ^* (1 + byN~ Al + . . .) (8.2c) 

as N — > oo at fixed temperature T > Tg. The critical exponents v and Ai are 
universal. The amplitudes A Re} A Rg} b Re7 b Rg} bq, are nonuniversal; in fact, even the 
signs of the correction-to-scaling amplitudes b Re7 b Rg} bq, are nonuniversal. However, 
the RG theory also predicts that the dimensionless amplitude ratios A Rg /A Re} 
b Rg /b Re and bq,/b Re are universal [203, 52]. 

So there is no reason why the correction-to-scaling amplitudes should have any 
particular sign. In the continuum Edwards model, the effective exponents v e ff,R e = 
| d\og(R 2 ) I d\og N and v t g,R 9 = |^ l°g (-R^)/^ l°g N and the interpenetration ratio 
^ all approach their asymptotic values from below [209, 15, 22, 23, 24]: that is, 
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Figure 15: Interpenetration ratio ^ versus chain length iV, for SAWs in d = 3. Error 
bar is one standard deviation. Data from 
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bn e} bfi g > and bq, < 0. On the other hand, in lattice self-avoiding walks, these 
quantities approach their asymptotic values from above [203, 39]; and the same occurs 
in the bead-rod model with sufficiently large bead diameter [224]. Indeed, this latter 
behavior is almost obvious qualitatively: short self-avoiding walks behave roughly like 
hard spheres; only at larger N does one see the softer excluded volume (smaller *&) 
characteristic of a fractal object. All these models agree closely, as they should, for 
the leading universal quantities z/, Ar 9 / ' Ar p and and they agree reasonably well 
for the universal correction-to-scaling quantities Ai, &R g /&R e and bq,/bn e . 

In summary, the error of all two-parameter theories is to fail to distinguish cor- 
rectly which quantities are universal and which are non-universal. In particular, the 
modern two-parameter theory begins from one special model — the continuum Ed- 
wards model — and assumes (incorrectly) that it can describe certain aspects of 
polymer behavior (e.g. the sign of approach to ^*) which in reality are non-universal. 

However, this is not the end of the story: The continuum Edwards model does 
in fact describe universal properties of polymer molecules, albeit not the behavior as 
N — > oo at fixed temperature T > Tg. Rather, this theory describes the universal 
crossover scaling behavior in an infinitesimal region just above the theta temperature , 
namely the limit N — > oo, T — >■ Tg with x = N^(T — Tg) fixed, where cf> is a suitable 
crossover exponent. More precisely, for suitably chosen exponents <f> and vg } the 
following limits are expected to exist: 
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The exponents <f> and vg are universal, and the crossover scaling functions /r c , 
fn g and fq, are universal modulo a rescaling of abscissa and ordinate. The exponents 
are believed to take the values 
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The functions f Re and f R (and fq, at least for x > 0) are monotonically increasing 
functions of their argument x = N^(T — Tg), with the asymptotic behavior 

fRe(x),f R (x) ~ , 2{v „- Ve \u as x ^ +oc (8.6a) 

y —x) ^ coll "etiv as ™ 



. , J**(>0) as x ->■ +00 , . 

~ < , v / n \ (8.6b) 
L unknown (< 0) as x — > —00 

where v co \\ = 1/d. Then the claim [222] is that, for 3 < d < 4, the functions fn e (x) } 
/r (x) and fq>(x) for x > are given precisely by the continuum Edwards model, 
modulo the nonuniversal rescaling of abscissa and ordinate: 

f Re (x) = K x a\{K 2 x) (8.7a) 

f Rg (x) = (A^/6)4(^ 2 x) (8.7b) 

f^x) = h(K 2 x)/a d s (K 2 x) (8.7c) 

Here a R (z), ct 2 s (z) and h(z) = zh(z) are the conventional expansion and second virial 
factors of the continuum Edwards model [12, 15, 210, 22, 23, 24], and Ki and K 2 
are nonuniversal scale factors. Thus, the continuum Edwards model is a correct 
theory for a certain limiting regime in the molecular- weight /temperature plane — 
but this regime is not the one previously thought. The explanation of (8.7) relies 
on a Wilson-deGennes-type renormalization group [13, 227]; see [222] for details, and 
[223] for further discussion. 

It will be very interesting to test the predictions (8.7) numerically. But this will 
require better algorithms for simulating SAWs near the theta point (see Section 9.2). 



8.2 Linear Polymers in Dimension d = 2 

Dimension d = 2 is very special: for many statistical-mechanical systems, the 
critical exponents can be determined exactly (though non-rigorously) by Coulomb- 
gas [228, 229] and/or conformal-invariance [230, 231, 232] arguments. This is the case 
for the two-dimensional SAW, for which we know the exact exponents v = 3/4 and 
7 = 43/32 [228]. The unknown quantities in this model are the various universal 
amplitude ratios, which together determine the typical shape of a SAW and the 
strength of its interactions with other SAWs. The pivot algorithm has been employed 
[164] to obtain extremely accurate values for the limiting ratios 

Aoo = lim = 0.14026 ± 0.00011 (8.8) 



(Rl) 



N 



Boo = lim = 0.43962 ± 0.00033 (8.9) 
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In particular, this confirms the beautiful conformal-invariance prediction [233, 164] 

246 1 

— Aoo - 2B00 + - = . (8.10) 
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Another open question, which has attracted a lot of work, concerns the correction- 
to-scaling exponents in the two-dimensional SAW. We are currently using the pivot 
algorithm to investigate this question [204]. 

9 Conclusions 

9.1 Practical Recommendations 

What is the upshot of all this for the practicing polymer scientist, who wants to 
know which algorithm to use when? Here are a few recommendations: 

1) When simulating linear polymers for the purpose of studying global observables 
(e.g. the critical exponent z/, universal amplitude ratios, etc.), use the pivot 
algorithm. For initialization, use dimerization if this can be done in a CPU 
time less than half of your total planned run length; otherwise use the method 
proposed in [168], but be careful to discard at least ~N/f iterations at the 
beginning of the run. 

2) When simulating linear polymers for the purpose of studying local observables 
(e.g. number of bends, nearest-neighbor contacts, etc.), use the pivot algorithm 
as described above, but make sure that the run length is > lOOOAf// itera- 
tions. Alternatively, use the slithering-tortoise algorithm or the incomplete- 
enumeration algorithm. 

3) To obtain the critical exponent 7, use the join-and-cut algorithm (together with 
the pivot algorithm for the A^-conserving moves). 

4) To obtain the critical exponent a s i ng , use the BFACF + cut-and-paste algo- 
rithm. 

5) To obtain the connective constant //, use the slithering-tortoise algorithm or the 
incomplete-enumeration algorithm. 

Of course, these recommendations are not engraved in stone; other algorithms could 
be useful in certain situations. 

9.2 Open Problems 

There are numerous open problems concerning the behavior of the algorithms 
discussed in this article. Among the most important ones are: 

1) For d < 4, does there exist any static Monte Carlo algorithm for generating a 
random A^-step SAW (with exactly uniform distribution) in a mean CPU time 
that is bounded by a polynomial in A^? [Section 4.3] 

2) What is the precise behavior of the enrichment algorithm as iV -> 00? [Section 
5.2] 



68 



What is the precise behavior of the incomplete-enumeration algorithm as N — > 
oo? [Section 5.3] 

Are local iV-conserving algorithms necessarily nonergodic also in dimension d > 
4? [Section 6.4.1] 

What is the dynamic critical exponent of the various local iV-conserving algo- 
rithms (restricted to the ergodic class of a straight rod)? Is it exactly 2 + 2z/ for 
algorithms not having special conservation laws? What about for algorithms 
having special conservation laws, such as Verdier-Stockmayer? [Section 6.4.1] 

What is the dynamic critical exponent for the various bilocal (or mixed lo- 
cal/bilocal) algorithms? Is the conjecture r ~ iV 2 exact, approximate or wrong? 
[Section 6.4.2] 

Can we improve our theoretical understanding of the acceptance fraction and 
dynamic critical behavior of the pivot algorithm? [Section 6.4.3] 

Are there any bilocal (or mixed local/bilocal) algorithms that are ergodic for 
the fixed- N, fixed- £ ensemble? If so, what is their dynamic critical behavior? 
[Section 6.5.1] 

What is the dynamic critical exponent of the fixed- N cut-and-paste algorithm? 
[Section 6.5.2] 

What is the precise dynamic critical exponent of the slithering-tortoise algo- 
rithm? Is it strictly between 2 and 1 + 7? [Section 6.6.1] 

What is the precise dynamic critical exponent of the BFACF algorithm? Is it 
exactly 4z/? [Section 6.7.1] 

What is the precise dynamic critical behavior of the BFACF + cut-and-paste 
algorithm, as a function of (N) and p n i? [Section 6.7.2] 

Can the Karp-Luby algorithm be generalized to compute virial coefficients Bk 
for k > 3? [Section 7.2] 

In addition, there are many interesting open problems concerning the adaptation 
of these algorithms — or the invention of new algorithms — for polymeric systems 
more complicated than a single SAW (athermal linear polymer) in infinite space. For 
lack of space, I can merely list these problems and give a few (grossly incomplete) 
bibliographical references. 

SAWs in confined geometries. In recent years there has been much interest in 
studying SAWs attached to surfaces or confined to specified regions (wedges, slabs, 
tubes, etc.) [234, 235]. Most of the algorithms described here do generalize to such sit- 
uations; but it is no longer guaranteed that they are even ergodic, much less efficient. 
This requires a case-by-case study. See [40, Section 2.4.2] for some references. 
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SAWs with nearest-neighbor attraction (— > theta point). The transition of polymer 
conformation from the high-temperature (good-solvent) regime to the theta point to 
the collapsed regime is well modelled by the self-avoiding walk with nearest-neighbor 
attraction. One of the most fundamental problems in polymer physics is to under- 
stand quantitatively the details of this crossover, both in d = 3 [225, 226, 222] and 
in d = 2 [236]. The dynamic algorithms described here can easily be modified to 
handle a nearest-neighbor interaction, by inserting a Metropolis accept/reject step. 
But their efficiency may deteriorate markedly in the neighborhood of the theta point 
and even more drastically in the collapsed regime; this requires a detailed study for 
each algorithm. See [171, 172, 237, 238] for some recent preliminary work on the pivot 
algorithm; and see [40] for citations of older work. 

Branched polymers. In recent years much attention has been devoted to the theo- 
retical and experimental study of branched polymers, whose behavior is quite different 
from that of linear or ring polymers. The simplest case is that of branched polymers 
with fixed topology, such as star or comb polymers. Many of the algorithms for 
linear polymers can be adapted to this case, although both the ergodicity and the 
efficiency are nontrivial problems. See [40, Section 2.4.3] for some references, and see 
[169, 170, 171, 172, 174, 176] for recent work using the pivot algorithm. 

A more difficult problem is that of branched polymers with variable topology, 
such as arbitrarily-branched lattice trees. Early works used simple sampling [239], 
while more recent works have used variants of the slithering-tortoise or incomplete- 
enumeration algorithms [240]. However, in all these cases it has been difficult to 
generate branched polymers of more than 50 segments, because of the critical 
slowing-down. On the other hand, the need for large polymers is even more acute 
here than for linear or ring polymers, since one needs a rather large number of seg- 
ments in order to feel the full effects of the branching. (That is, one expects large 
corrections to scaling in which the effective exponents for small N are biased toward 
the unbranched-polymer values.) A very promising non-local algorithm was devised 
recently by Janse van Rensburg and Madras [241]. 

Multi-chain systems. Much current work in polymer science, both theoretical and 
experimental, focusses on semidilute and concentrated solutions and on melts. The 
simulation of multi-chain systems poses very difficult problems: for example, it is 
an open question whether any known algorithm is ergodic at any nonzero density! 
In the semidilute case, many of the algorithms described here can be applied with 
minor modification, and their performance in practice (if one disregards the ergodicity 
problem) will probably be similar to that for single chains. On the other hand, dense 
solutions and melts constitute a much more difficult problem, due to the possibility 
of "gridlock". Several algorithms have been proposed [242], involving both local- 
deformation and subunit-exchange moves, but the relaxation seems in general to be 
very slow. Progress in this area will probably require new physical and algorithmic 
ideas. See [40, Section 3] for some references. 
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